Capacity of Distributed Storage Systems with Clusters and Separate Nodes
In distributed storage systems (DSSs), the optimal tradeoff between node storage and repair bandwidth is an important issue for designing distributed coding strategies to ensure large scale data reliability. The capacity of DSSs is obtained as a function of node storage and repair bandwidth parameters, characterizing the tradeoff. There are lots of works on DSSs with clusters (racks) where the repair bandwidths from intra-cluster and cross-cluster are differentiated. However, separate nodes are also prevalent in the realistic DSSs, but the works on DSSs with clusters and separate nodes (CSN-DSSs) are insufficient. In this paper, we formulate the capacity of CSN-DSSs with one separate node for the first time where the bandwidth to repair a separate node is of cross-cluster. Consequently, the optimal tradeoff between node storage and repair bandwidth are derived and compared with cluster DSSs. A regenerating code instance is constructed based on the tradeoff. Furthermore, the influence of adding a separate node is analyzed and formulated theoretically. We prove that when each cluster contains R nodes and any k nodes suffice to recover the original file (MDS property), adding an extra separate node will keep the capacity if R|k, and reduce the capacity otherwise.
READ FULL TEXT