Convergence and concentration properties of constant step-size SGD through Markov chains

06/20/2023
by   Ibrahim Merad, et al.
0

We consider the optimization of a smooth and strongly convex objective using constant step-size stochastic gradient descent (SGD) and study its properties through the prism of Markov chains. We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance. We also establish this convergence in Wasserstein-2 distance in a more general setting compared to previous work. Thanks to the invariance property of the limit distribution, our analysis shows that the latter inherits sub-Gaussian or sub-exponential concentration properties when these hold true for the gradient. This allows the derivation of high-confidence bounds for the final estimate. Finally, under such conditions in the linear case, we obtain a dimension-free deviation bound for the Polyak-Ruppert average of a tail sequence. All our results are non-asymptotic and their consequences are discussed through a few applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2017

Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains

We consider the minimization of an objective function given access to un...
research
06/14/2020

An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias

Structured non-convex learning problems, for which critical points have ...
research
04/29/2019

Making the Last Iterate of SGD Information Theoretically Optimal

Stochastic gradient descent (SGD) is one of the most widely used algorit...
research
06/28/2021

The Convergence Rate of SGD's Final Iterate: Analysis on Dimension Dependence

Stochastic Gradient Descent (SGD) is among the simplest and most popular...
research
06/28/2023

Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements

For min-max optimization and variational inequalities problems (VIP) enc...
research
07/19/2022

A sharp uniform-in-time error estimate for Stochastic Gradient Langevin Dynamics

We establish a sharp uniform-in-time error estimate for the Stochastic G...
research
02/05/2021

Last iterate convergence of SGD for Least-Squares in the Interpolation regime

Motivated by the recent successes of neural networks that have the abili...

Please sign up or login with your details

Forgot password? Click here to reset