Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD

02/10/2023
by   Mert Gurbuzbalaban, et al.
0

Cyclic and randomized stepsizes are widely used in the deep learning practice and can often outperform standard stepsize choices such as constant stepsize in SGD. Despite their empirical success, not much is currently known about when and why they can theoretically improve the generalization performance. We consider a general class of Markovian stepsizes for learning, which contain i.i.d. random stepsize, cyclic stepsize as well as the constant stepsize as special cases, and motivated by the literature which shows that heaviness of the tails (measured by the so-called "tail-index") in the SGD iterates is correlated with generalization, we study tail-index and provide a number of theoretical results that demonstrate how the tail-index varies on the stepsize scheduling. Our results bring a new understanding of the benefits of cyclic and randomized stepsizes compared to constant stepsize in terms of the tail behavior. We illustrate our theory on linear regression experiments and show through deep learning experiments that Markovian stepsizes can achieve even a heavier tail and be a viable alternative to cyclic and i.i.d. randomized stepsize rules.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2021

Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression

Stochastic gradient descent (SGD) has been demonstrated to generalize we...
research
02/22/2019

Characterization of the tail behavior of a class of BEKK processes: A stochastic recurrence equation approach

We provide new, mild conditions for strict stationarity and ergodicity o...
research
05/13/2022

Heavy-Tail Phenomenon in Decentralized SGD

Recent theoretical studies have shown that heavy-tails can emerge in sto...
research
06/16/2020

Hausdorff Dimension, Stochastic Differential Equations, and Generalization in Neural Networks

Despite its success in a wide range of applications, characterizing the ...
research
04/23/2019

Semi-Cyclic Stochastic Gradient Descent

We consider convex SGD updates with a block-cyclic structure, i.e. where...
research
11/19/2018

Cyclic bent functions and their applications in codes, codebooks, designs, MUBs and sequences

Let m be an even positive integer. A Boolean bent function f on m-1× is ...
research
04/19/2021

Random Reshuffling with Variance Reduction: New Analysis and Better Rates

Virtually all state-of-the-art methods for training supervised machine l...

Please sign up or login with your details

Forgot password? Click here to reset