Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions

01/27/2023
by   Anant Raj, et al.
0

Heavy-tail phenomena in stochastic gradient descent (SGD) have been reported in several empirical studies. Experimental evidence in previous works suggests a strong interplay between the heaviness of the tails and generalization behavior of SGD. To address this empirical phenomena theoretically, several works have made strong topological and statistical assumptions to link the generalization error to heavy tails. Very recently, new generalization bounds have been proven, indicating a non-monotonic relationship between the generalization error and heavy tails, which is more pertinent to the reported empirical observations. While these bounds do not require additional topological assumptions given that SGD can be modeled using a heavy-tailed stochastic differential equation (SDE), they can only apply to simple quadratic problems. In this paper, we build on this line of research and develop generalization bounds for a more general class of objective functions, which includes non-convex functions as well. Our approach is based on developing Wasserstein stability bounds for heavy-tailed SDEs and their discretizations, which we then convert to generalization bounds. Our results do not require any nontrivial assumptions; yet, they shed more light to the empirical observations, thanks to the generality of the loss functions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2022

Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares

Recent studies have shown that heavy tails can emerge in stochastic opti...
research
05/21/2018

Learning with Non-Convex Truncated Losses by SGD

Learning with a convex loss function has been a dominating paradigm for...
research
06/15/2020

Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent

Recently there are a considerable amount of work devoted to the study of...
research
06/16/2020

Hausdorff Dimension, Stochastic Differential Equations, and Generalization in Neural Networks

Despite its success in a wide range of applications, characterizing the ...
research
05/23/2022

Chaotic Regularization and Heavy-Tailed Limits for Deterministic Gradient Descent

Recent studies have shown that gradient descent (GD) can achieve improve...
research
01/18/2019

A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks

The gradient noise (GN) in the stochastic gradient descent (SGD) algorit...
research
06/13/2023

Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD

Neural network compression has been an increasingly important subject, d...

Please sign up or login with your details

Forgot password? Click here to reset