Federated Random Reshuffling with Compression and Variance Reduction

by   Grigory Malinovsky, et al.

Random Reshuffling (RR), which is a variant of Stochastic Gradient Descent (SGD) employing sampling without replacement, is an immensely popular method for training supervised machine learning models via empirical risk minimization. Due to its superior practical performance, it is embedded and often set as default in standard machine learning software. Under the name FedRR, this method was recently shown to be applicable to federated learning (Mishchenko et al.,2021), with superior performance when compared to common baselines such as Local SGD. Inspired by this development, we design three new algorithms to improve FedRR further: compressed FedRR and two variance reduced extensions: one for taming the variance coming from shuffling and the other for taming the variance due to compression. The variance reduction mechanism for compression allows us to eliminate dependence on the compression parameter, and applying additional controlled linear perturbations for Random Reshuffling, introduced by Malinovsky et al.(2021) helps to eliminate variance at the optimum. We provide the first analysis of compressed local methods under standard assumptions without bounded gradient assumptions and for heterogeneous data, overcoming the limitations of the compression operator. We corroborate our theoretical results with experiments on synthetic and real data sets.


page 1

page 2

page 3

page 4


FLECS-CGD: A Federated Learning Second-Order Framework via Compression and Sketching with Compressed Gradient Differences

In the recent paper FLECS (Agafonov et al, FLECS: A Federated Learning S...

Random Reshuffling with Variance Reduction: New Analysis and Better Rates

Virtually all state-of-the-art methods for training supervised machine l...

Escaping Saddle Points with Compressed SGD

Stochastic gradient descent (SGD) is a prevalent optimization technique ...

Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization

Due to the high communication cost in distributed and federated learning...

Don't Jump Through Hoops and Remove Those Loops: SVRG and Katyusha are Better Without the Outer Loop

The stochastic variance-reduced gradient method (SVRG) and its accelerat...

Gradient Descent with Compressed Iterates

We propose and analyze a new type of stochastic first order method: grad...

Convergence of Variance-Reduced Stochastic Learning under Random Reshuffling

Several useful variance-reduced stochastic gradient algorithms, such as ...

Please sign up or login with your details

Forgot password? Click here to reset