Optimal Gradient Compression for Distributed and Federated Learning

by   Alyazeed Albasyoni, et al.

Communicating information, like gradient vectors, between computing nodes in distributed and federated learning is typically an unavoidable burden, resulting in scalability issues. Indeed, communication might be slow and costly. Recent advances in communication-efficient training algorithms have reduced this bottleneck by using compression techniques, in the form of sparsification, quantization, or low-rank approximation. Since compression is a lossy, or inexact, process, the iteration complexity is typically worsened; but the total communication complexity can improve significantly, possibly leading to large computation time savings. In this paper, we investigate the fundamental trade-off between the number of bits needed to encode compressed vectors and the compression error. We perform both worst-case and average-case analysis, providing tight lower bounds. In the worst-case analysis, we introduce an efficient compression operator, Sparse Dithering, which is very close to the lower bound. In the average-case analysis, we design a simple compression operator, Spherical Compression, which naturally achieves the lower bound. Thus, our new compression schemes significantly outperform the state of the art. We conduct numerical experiments to illustrate this improvement.


page 1

page 2

page 3

page 4


Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with Communication Compression

Recent advances in distributed optimization and learning have shown that...

Hyper-Sphere Quantization: Communication-Efficient SGD for Federated Learning

The high cost of communicating gradients is a major bottleneck for feder...

DRIVE: One-bit Distributed Mean Estimation

We consider the problem where n clients transmit d-dimensional real-valu...

Artemis: tight convergence guarantees for bidirectional compression in Federated Learning

We introduce a new algorithm - Artemis - tackling the problem of learnin...

Communication Complexity of One-Shot Remote State Preparation

Quantum teleportation uses prior shared entanglement and classical commu...

Optimal Compression of Unit Norm Vectors in the High Distortion Regime

Motivated by the need for communication-efficient distributed learning, ...

Please sign up or login with your details

Forgot password? Click here to reset