Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains

by   Aymeric Dieuleveut, et al.

We consider the minimization of an objective function given access to unbiased estimates of its gradient through stochastic gradient descent (SGD) with constant step-size. While the detailed analysis was only performed for quadratic functions, we provide an explicit asymptotic expansion of the moments of the averaged SGD iterates that outlines the dependence on initial conditions, the effect of noise and the step-size, as well as the lack of convergence in the general (non-quadratic) case. For this analysis, we bring tools from Markov chain theory into the analysis of stochastic gradient and create new ones (similar but different from stochastic MCMC methods). We then show that Richardson-Romberg extrapolation may be used to get closer to the global optimum and we show empirical improvements of the new extrapolation scheme.


page 1

page 2

page 3

page 4


Error Lower Bounds of Constant Step-size Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) plays a central role in modern machine...

Parameter Averaging for SGD Stabilizes the Implicit Bias towards Flat Regions

Stochastic gradient descent is a workhorse for training deep neural netw...

Convergence and concentration properties of constant step-size SGD through Markov chains

We consider the optimization of a smooth and strongly convex objective u...

Constant Step Size Least-Mean-Square: Bias-Variance Trade-offs and Optimal Sampling Distributions

We consider the least-squares regression problem and provide a detailed ...

Sub-linear convergence of a tamed stochastic gradient descent method in Hilbert space

In this paper, we introduce the tamed stochastic gradient descent method...

Convergence of constant step stochastic gradient descent for non-smooth non-convex functions

This paper studies the asymptotic behavior of the constant step Stochast...

On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent

Constant step-size Stochastic Gradient Descent exhibits two phases: a tr...

Please sign up or login with your details

Forgot password? Click here to reset