On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent

07/01/2020
by   Scott Pesme, et al.
0

Constant step-size Stochastic Gradient Descent exhibits two phases: a transient phase during which iterates make fast progress towards the optimum, followed by a stationary phase during which iterates oscillate around the optimal point. In this paper, we show that efficiently detecting this transition and appropriately decreasing the step size can lead to fast convergence rates. We analyse the classical statistical test proposed by Pflug (1983), based on the inner product between consecutive stochastic gradients. Even in the simple case where the objective function is quadratic we show that this test cannot lead to an adequate convergence diagnostic. We then propose a novel and simple statistical procedure that accurately detects stationarity and we provide experimental results showing state-of-the-art performance on synthetic and real-world datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2017

Convergence diagnostics for stochastic gradient descent with constant step size

Iterative procedures in stochastic optimization are typically comprised ...
research
08/27/2020

Understanding and Detecting Convergence for Stochastic Gradient Descent with Momentum

Convergence detection of iterative stochastic optimization methods is of...
research
07/20/2017

Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains

We consider the minimization of an objective function given access to un...
research
09/22/2021

On the equivalence of different adaptive batch size selection strategies for stochastic gradient descent methods

In this study, we demonstrate that the norm test and inner product/ortho...
research
06/15/2020

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

In the context of statistical supervised learning, the noiseless linear ...
research
02/05/2021

Last iterate convergence of SGD for Least-Squares in the Interpolation regime

Motivated by the recent successes of neural networks that have the abili...
research
09/19/2016

Geometrically Convergent Distributed Optimization with Uncoordinated Step-Sizes

A recent algorithmic family for distributed optimization, DIGing's, have...

Please sign up or login with your details

Forgot password? Click here to reset