Quasi-potential theory for escape problem: Quantitative sharpness effect on SGD's escape from local minima

11/07/2021
by   Hikaru Ibayashi, et al.
0

We develop a quantitative theory on an escape problem of a stochastic gradient descent (SGD) algorithm and investigate the effect of sharpness of loss surfaces on the escape. Deep learning has achieved tremendous success in various domains, however, it has opened up various theoretical open questions. One of the typical questions is why an SGD can find parameters that generalize well over non-convex loss surfaces. An escape problem is an approach to tackle this question, which investigates how efficiently an SGD escapes from local minima. In this paper, we develop a quasi-potential theory for the escape problem, by applying a theory of stochastic dynamical systems. We show that the quasi-potential theory can handle both geometric properties of loss surfaces and a covariance structure of gradient noise in a unified manner, while they have been separately studied in previous works. Our theoretical results imply that (i) the sharpness of loss surfaces contributes to the slow escape of an SGD, and (ii) the SGD's noise structure cancels the effect and exponentially accelerates the escape. We also conduct experiments to empirically validate our theory using neural networks trained with real data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/18/2019

Quasi-potential as an implicit regularizer for the loss function in the stochastic gradient descent

We interpret the variational inference of the Stochastic Gradient Descen...
research
07/25/2021

SGD May Never Escape Saddle Points

Stochastic gradient descent (SGD) has been deployed to solve highly non-...
research
01/07/2018

Theory of Deep Learning IIb: Optimization Properties of SGD

In Theory IIb we characterize with a mix of theory and experiments the o...
research
07/07/2019

Quantitative W_1 Convergence of Langevin-Like Stochastic Processes with Non-Convex Potential State-Dependent Noise

We prove quantitative convergence rates at which discrete Langevin-like ...
research
07/06/2022

When does SGD favor flat minima? A quantitative characterization via linear stability

The observation that stochastic gradient descent (SGD) favors flat minim...
research
03/01/2020

Most Probable Dynamics of Stochastic Dynamical Systems with Exponentially Light Jump Fluctuations

The emergence of the exit events from a bounded domain containing a stab...
research
05/20/2023

Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning

Whenever applicable, the Stochastic Gradient Descent (SGD) has shown its...

Please sign up or login with your details

Forgot password? Click here to reset