Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with Communication Compression

by   Xinmeng Huang, et al.

Recent advances in distributed optimization and learning have shown that communication compression is one of the most effective means of reducing communication. While there have been many results on convergence rates under communication compression, a theoretical lower bound is still missing. Analyses of algorithms with communication compression have attributed convergence to two abstract properties: the unbiased property or the contractive property. They can be applied with either unidirectional compression (only messages from workers to server are compressed) or bidirectional compression. In this paper, we consider distributed stochastic algorithms for minimizing smooth and non-convex objective functions under communication compression. We establish a convergence lower bound for algorithms whether using unbiased or contractive compressors in unidirection or bidirection. To close the gap between the lower bound and the existing upper bounds, we further propose an algorithm, NEOLITHIC, which almost reaches our lower bound (up to logarithm factors) under mild conditions. Our results also show that using contractive bidirectional compression can yield iterative methods that converge as fast as those using unbiased unidirectional compression. The experimental results validate our findings.


page 1

page 2

page 3

page 4


Lower Bounds and Accelerated Algorithms in Distributed Stochastic Optimization with Communication Compression

Communication compression is an essential strategy for alleviating commu...

Unbiased Compression Saves Communication in Distributed Optimization: When and How Much?

Communication compression is a common technique in distributed optimizat...

Artemis: tight convergence guarantees for bidirectional compression in Federated Learning

We introduce a new algorithm - Artemis - tackling the problem of learnin...

Optimal Gradient Compression for Distributed and Federated Learning

Communicating information, like gradient vectors, between computing node...

Preserved central model for faster bidirectional compression in distributed settings

We develop a new approach to tackle communication constraints in a distr...

Interaction is necessary for distributed learning with privacy or communication constraints

Local differential privacy (LDP) is a model where users send privatized ...

Sign Bit is Enough: A Learning Synchronization Framework for Multi-hop All-reduce with Ultimate Compression

Traditional one-bit compressed stochastic gradient descent can not be di...

Please sign up or login with your details

Forgot password? Click here to reset