Multiplicative noise and heavy tails in stochastic optimization

06/11/2020
by   Liam Hodgkinson, et al.
0

Although stochastic optimization is central to modern machine learning, the precise mechanisms underlying its success, and in particular, the precise role of the stochasticity, still remain unclear. Modelling stochastic optimization algorithms as discrete random recurrence relations, we show that multiplicative noise, as it commonly arises due to variance in local rates of convergence, results in heavy-tailed stationary behaviour in the parameters. A detailed analysis is conducted for SGD applied to a simple linear regression problem, followed by theoretical results for a much larger class of models (including non-linear and non-convex) and optimizers (including momentum, Adam, and stochastic Newton), demonstrating that our qualitative results hold much more generally. In each case, we describe dependence on key factors, including step size, batch size, and data variability, all of which exhibit similar qualitative behavior to recent empirical results on state-of-the-art neural network models from computer vision and natural language processing. Furthermore, we empirically demonstrate how multiplicative noise and heavy-tailed structure improve capacity for basin hopping and exploration of non-convex loss surfaces, over commonly-considered stochastic dynamics with only additive noise and light-tailed structure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2023

High Probability Analysis for Non-Convex Stochastic Optimization with Clipping

Gradient clipping is a commonly used technique to stabilize the training...
research
08/25/2021

Heavy-tailed Streaming Statistical Estimation

We consider the task of heavy-tailed statistical estimation given stream...
research
05/13/2022

Heavy-Tail Phenomenon in Decentralized SGD

Recent theoretical studies have shown that heavy-tails can emerge in sto...
research
08/02/2021

Generalization Properties of Stochastic Optimizers via Trajectory Analysis

Despite the ubiquitous use of stochastic optimization algorithms in mach...
research
01/19/2021

On Monte-Carlo methods in convex stochastic optimization

We develop a novel procedure for estimating the optimizer of general con...
research
02/13/2020

Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum under Heavy-Tailed Gradient Noise

Stochastic gradient descent with momentum (SGDm) is one of the most popu...
research
02/20/2021

Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

Recent studies have provided both empirical and theoretical evidence ill...

Please sign up or login with your details

Forgot password? Click here to reset