Delay Comparison of Delivery and Coding Policies in Data Clusters
A key function of cloud infrastructure is to store and deliver diverse files, e.g., scientific datasets, social network information, videos, etc. In such systems, for the purpose of fast and reliable delivery, files are divided into chunks, replicated or erasure-coded, and disseminated across servers. It is neither known in general how delays scale with the size of a request nor how delays compare under different policies for coding, data dissemination, and delivery. Motivated by these questions, we develop and explore a set of evolution equations as a unified model which captures the above features. These equations allow for both efficient simulation and mathematical analysis of several delivery policies under general statistical assumptions. In particular, we quantify in what sense a workload aware delivery policy performs better than a workload agnostic policy. Under a dynamic or stochastic setting, the sample path comparison of these policies does not hold in general. The comparison is shown to hold under the weaker increasing convex stochastic ordering, still stronger than the comparison of averages. This result further allows us to obtain insightful computable performance bounds. For example, we show that in a system where files are divided into chunks of equal size, replicated or erasure-coded, and disseminated across servers at random, the job delays increase sub-logarithmically in the request size for small and medium-sized files but linearly for large files.
READ FULL TEXT