EF21 with Bells Whistles: Practical Algorithmic Extensions of Modern Error Feedback

10/07/2021
by   Ilyas Fatkhullin, et al.
7

First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators. However, existing theory of EF relies on very strong assumptions (e.g., bounded gradients), and provides pessimistic convergence rates (e.g., while the best known rate for EF in the smooth nonconvex regime, and when full gradients are compressed, is O(1/T^2/3), the rate of gradient descent in the same regime is O(1/T)). Recently, Richtárik et al. (2021) proposed a new error feedback mechanism, EF21, based on the construction of a Markov compressor induced by a contractive compressor. EF21 removes the aforementioned theoretical deficiencies of EF and at the same time works better in practice. In this work we propose six practical extensions of EF21, all supported by strong convergence theory: partial participation, stochastic approximation, variance reduction, proximal setting, momentum and bidirectional compression. Several of these techniques were never analyzed in conjunction with EF before, and in cases where they were (e.g., bidirectional compression), our rates are vastly superior.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2021

EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback

Error feedback (EF), also known as error compensation, is an immensely p...
research
05/30/2023

Clip21: Error Feedback for Gradient Clipping

Motivated by the increasing popularity and importance of large-scale tra...
research
09/30/2022

EF21-P and Friends: Improved Theoretical Communication Complexity for Distributed Optimization with Bidirectional Compression

The starting point of this paper is the discovery of a novel and simple ...
research
05/24/2023

Momentum Provably Improves Error Feedback!

Due to the high communication overhead when training machine learning mo...
research
10/31/2022

Adaptive Compression for Communication-Efficient Distributed Training

We propose Adaptive Compressed Gradient Descent (AdaCGD) - a novel optim...
research
06/12/2020

A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization

In this paper, we study the performance of a large family of SGD variant...
research
02/24/2021

Preserved central model for faster bidirectional compression in distributed settings

We develop a new approach to tackle communication constraints in a distr...

Please sign up or login with your details

Forgot password? Click here to reset