Optimal Gradient Compression for Distributed and Federated Learning

10/07/2020
by   Alyazeed Albasyoni, et al.
0

Communicating information, like gradient vectors, between computing nodes in distributed and federated learning is typically an unavoidable burden, resulting in scalability issues. Indeed, communication might be slow and costly. Recent advances in communication-efficient training algorithms have reduced this bottleneck by using compression techniques, in the form of sparsification, quantization, or low-rank approximation. Since compression is a lossy, or inexact, process, the iteration complexity is typically worsened; but the total communication complexity can improve significantly, possibly leading to large computation time savings. In this paper, we investigate the fundamental trade-off between the number of bits needed to encode compressed vectors and the compression error. We perform both worst-case and average-case analysis, providing tight lower bounds. In the worst-case analysis, we introduce an efficient compression operator, Sparse Dithering, which is very close to the lower bound. In the average-case analysis, we design a simple compression operator, Spherical Compression, which naturally achieves the lower bound. Thus, our new compression schemes significantly outperform the state of the art. We conduct numerical experiments to illustrate this improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2020

Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor

In order to mitigate the high communication cost in distributed and fede...
research
06/08/2022

Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with Communication Compression

Recent advances in distributed optimization and learning have shown that...
research
11/12/2019

Hyper-Sphere Quantization: Communication-Efficient SGD for Federated Learning

The high cost of communicating gradients is a major bottleneck for feder...
research
05/18/2021

DRIVE: One-bit Distributed Mean Estimation

We consider the problem where n clients transmit d-dimensional real-valu...
research
06/25/2020

Artemis: tight convergence guarantees for bidirectional compression in Federated Learning

We introduce a new algorithm - Artemis - tackling the problem of learnin...
research
02/21/2018

Communication Complexity of One-Shot Remote State Preparation

Quantum teleportation uses prior shared entanglement and classical commu...
research
07/16/2023

Optimal Compression of Unit Norm Vectors in the High Distortion Regime

Motivated by the need for communication-efficient distributed learning, ...

Please sign up or login with your details

Forgot password? Click here to reset