Random Reshuffling with Variance Reduction: New Analysis and Better Rates

04/19/2021
by   Grigory Malinovsky, et al.
11

Virtually all state-of-the-art methods for training supervised machine learning models are variants of SGD enhanced with a number of additional tricks, such as minibatching, momentum, and adaptive stepsizes. One of the tricks that works so well in practice that it is used as default in virtually all widely used machine learning software is random reshuffling (RR). However, the practical benefits of RR have until very recently been eluding attempts at being satisfactorily explained using theory. Motivated by recent development due to Mishchenko, Khaled and Richtárik (2020), in this work we provide the first analysis of SVRG under Random Reshuffling (RR-SVRG) for general finite-sum problems. First, we show that RR-SVRG converges linearly with the rate 𝒪(κ^3/2) in the strongly-convex case, and can be improved further to 𝒪(κ) in the big data regime (when n > 𝒪(κ)), where κ is the condition number. This improves upon the previous best rate 𝒪(κ^2) known for a variance reduced RR method in the strongly-convex case due to Ying, Yuan and Sayed (2020). Second, we obtain the first sublinear rate for general convex problems. Third, we establish similar fast rates for Cyclic-SVRG and Shuffle-Once-SVRG. Finally, we develop and analyze a more general variance reduction scheme for RR, which allows for less frequent updates of the control variate. We corroborate our theoretical results with suitably chosen experiments on synthetic and real datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2022

Federated Random Reshuffling with Compression and Variance Reduction

Random Reshuffling (RR), which is a variant of Stochastic Gradient Desce...
research
06/18/2020

Stochastic Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

In this paper, we introduce a simplified and unified method for finite-s...
research
01/18/2018

Faster Algorithms for Large-scale Machine Learning using Simple Sampling Techniques

Now a days, the major challenge in machine learning is the `Big Data' ch...
research
10/23/2020

Linearly Converging Error Compensated SGD

In this paper, we propose a unified analysis of variants of distributed ...
research
08/11/2023

Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

The recently proposed stochastic Polyak stepsize (SPS) and stochastic li...
research
05/02/2018

SVRG meets SAGA: k-SVRG --- A Tale of Limited Memory

In recent years, many variance reduced algorithms for empirical risk min...
research
02/10/2023

Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD

Cyclic and randomized stepsizes are widely used in the deep learning pra...

Please sign up or login with your details

Forgot password? Click here to reset