Emergent properties of the local geometry of neural loss landscapes

10/14/2019
by   Stanislav Fort, et al.
6

The local geometry of high dimensional neural network loss landscapes can both challenge our cherished theoretical intuitions as well as dramatically impact the practical success of neural network training. Indeed recent works have observed 4 striking local properties of neural loss landscapes on classification tasks: (1) the landscape exhibits exactly C directions of high positive curvature, where C is the number of classes; (2) gradient directions are largely confined to this extremely low dimensional subspace of positive Hessian curvature, leaving the vast majority of directions in weight space unexplored; (3) gradient descent transiently explores intermediate regions of higher positive curvature before eventually finding flatter minima; (4) training can be successful even when confined to low dimensional random affine hyperplanes, as long as these hyperplanes intersect a Goldilocks zone of higher than average curvature. We develop a simple theoretical model of gradients and Hessians, justified by numerical experiments on architectures and datasets used in practice, that simultaneously accounts for all 4 of these surprising and seemingly unrelated properties. Our unified model provides conceptual insights into the emergence of these properties and makes connections with diverse topics in neural networks, random matrix theory, and spin glasses, including the neural tangent kernel, BBP phase transitions, and Derrida's random energy model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2022

Visualizing high-dimensional loss landscapes with Hessian directions

Analyzing geometric properties of high-dimensional loss functions, such ...
research
07/06/2018

The Goldilocks zone: Towards better understanding of neural network loss landscapes

We explore the loss landscape of fully-connected neural networks using r...
research
04/03/2023

Charting the Topography of the Neural Network Landscape with Thermal-Like Noise

The training of neural networks is a complex, high-dimensional, non-conv...
research
06/08/2022

Diffusion Curvature for Estimating Local Curvature in High Dimensional Data

We introduce a new intrinsic measure of local curvature on point-cloud d...
research
11/15/2020

Explaining the Adaptive Generalisation Gap

We conjecture that the reason for the difference in generalisation betwe...
research
05/29/2023

SANE: The phases of gradient descent through Sharpness Adjusted Number of Effective parameters

Modern neural networks are undeniably successful. Numerous studies have ...
research
12/20/2019

MLRG Deep Curvature

We present MLRG Deep Curvature suite, a PyTorch-based, open-source packa...

Please sign up or login with your details

Forgot password? Click here to reset