Stochastic Optimization with Non-stationary Noise

06/08/2020
by   Jingzhao Zhang, et al.
0

We investigate stochastic optimization problems under relaxed assumptions on the distribution of noise that are motivated by empirical observations in neural network training. Standard results on optimal convergence rates for stochastic optimization assume either there exists a uniform bound on the moments of the gradient noise, or that the noise decays as the algorithm progresses. These assumptions do not match the empirical behavior of optimization algorithms used in neural network training where the noise level in stochastic gradients could even increase with time. We address this behavior by studying convergence rates of stochastic gradient methods subject to changing second moment (or variance) of the stochastic oracle as the iterations progress. When the variation in the noise is known, we show that it is always beneficial to adapt the step-size and exploit the noise variability. When the noise statistics are unknown, we obtain similar improvements by developing an online estimator of the noise level, thereby recovering close variants of RMSProp. Consequently, our results reveal an important scenario where adaptive stepsize methods outperform SGD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2011

Randomized Smoothing for Stochastic Optimization

We analyze convergence rates of stochastic optimization procedures for n...
research
01/18/2018

When Does Stochastic Gradient Algorithm Work Well?

In this paper, we consider a general stochastic optimization problem whi...
research
02/11/2022

The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance

We study convergence rates of AdaGrad-Norm as an exemplar of adaptive st...
research
04/20/2017

Performance Limits of Stochastic Sub-Gradient Learning, Part II: Multi-Agent Case

The analysis in Part I revealed interesting properties for subgradient l...
research
02/03/2019

Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions

We provide a novel computer-assisted technique for systematically analyz...
research
03/21/2021

Understanding performance variability in standard and pipelined parallel Krylov solvers

In this work, we collect data from runs of Krylov subspace methods and p...
research
05/22/2017

Follow the Signs for Robust Stochastic Optimization

Stochastic noise on gradients is now a common feature in machine learnin...

Please sign up or login with your details

Forgot password? Click here to reset