How noise affects the Hessian spectrum in overparameterized neural networks

10/01/2019
by   Mingwei Wei, et al.
0

Stochastic gradient descent (SGD) forms the core optimization method for deep neural networks. While some theoretical progress has been made, it still remains unclear why SGD leads the learning dynamics in overparameterized networks to solutions that generalize well. Here we show that for overparameterized networks with a degenerate valley in their loss landscape, SGD on average decreases the trace of the Hessian of the loss. We also generalize this result to other noise structures and show that isotropic noise in the non-degenerate subspace of the Hessian decreases its determinant. In addition to explaining SGDs role in sculpting the Hessian spectrum, this opens the door to new optimization approaches that may confer better generalization performance. We test our results with experiments on toy models and deep neural networks.

READ FULL TEXT
research
12/07/2020

A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization

Loss landscape analysis is extremely useful for a deeper understanding o...
research
07/24/2019

Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization

While stochastic gradient descent (SGD) and variants have been surprisin...
research
10/01/2019

The asymptotic spectrum of the Hessian of DNN throughout training

The dynamics of DNNs during gradient descent is described by the so-call...
research
05/28/2019

SGD on Neural Networks Learns Functions of Increasing Complexity

We perform an experimental study of the dynamics of Stochastic Gradient ...
research
11/12/2020

Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian

Over the last decade, a single algorithm has changed many facets of our ...
research
06/16/2020

Flatness is a False Friend

Hessian based measures of flatness, such as the trace, Frobenius and spe...
research
12/16/2019

PyHessian: Neural Networks Through the Lens of the Hessian

We present PyHessian, a new scalable framework that enables fast computa...

Please sign up or login with your details

Forgot password? Click here to reset