DeepAI AI Chat
Log In Sign Up

Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains

07/20/2017
by   Aymeric Dieuleveut, et al.
0

We consider the minimization of an objective function given access to unbiased estimates of its gradient through stochastic gradient descent (SGD) with constant step-size. While the detailed analysis was only performed for quadratic functions, we provide an explicit asymptotic expansion of the moments of the averaged SGD iterates that outlines the dependence on initial conditions, the effect of noise and the step-size, as well as the lack of convergence in the general (non-quadratic) case. For this analysis, we bring tools from Markov chain theory into the analysis of stochastic gradient and create new ones (similar but different from stochastic MCMC methods). We then show that Richardson-Romberg extrapolation may be used to get closer to the global optimum and we show empirical improvements of the new extrapolation scheme.

READ FULL TEXT

page 1

page 2

page 3

page 4

10/18/2019

Error Lower Bounds of Constant Step-size Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) plays a central role in modern machine...
02/18/2023

Parameter Averaging for SGD Stabilizes the Implicit Bias towards Flat Regions

Stochastic gradient descent is a workhorse for training deep neural netw...
11/29/2014

Constant Step Size Least-Mean-Square: Bias-Variance Trade-offs and Optimal Sampling Distributions

We consider the least-squares regression problem and provide a detailed ...
06/17/2021

Sub-linear convergence of a tamed stochastic gradient descent method in Hilbert space

In this paper, we introduce the tamed stochastic gradient descent method...
05/18/2020

Convergence of constant step stochastic gradient descent for non-smooth non-convex functions

This paper studies the asymptotic behavior of the constant step Stochast...
07/01/2020

On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent

Constant step-size Stochastic Gradient Descent exhibits two phases: a tr...
02/24/2022

Cutting Some Slack for SGD with Adaptive Polyak Stepsizes

Tuning the step size of stochastic gradient descent is tedious and error...