SANE: The phases of gradient descent through Sharpness Adjusted Number of Effective parameters

05/29/2023
by   Lawrence Wang, et al.
0

Modern neural networks are undeniably successful. Numerous studies have investigated how the curvature of loss landscapes can affect the quality of solutions. In this work we consider the Hessian matrix during network training. We reiterate the connection between the number of "well-determined" or "effective" parameters and the generalisation performance of neural nets, and we demonstrate its use as a tool for model comparison. By considering the local curvature, we propose Sharpness Adjusted Number of Effective parameters (SANE), a measure of effective dimensionality for the quality of solutions. We show that SANE is robust to large learning rates, which represent learning regimes that are attractive but (in)famously unstable. We provide evidence and characterise the Hessian shifts across "loss basins" at large learning rates. Finally, extending our analysis to deeper neural networks, we provide an approximation to the full-network Hessian, exploiting the natural ordering of neural weights, and use this approximation to provide extensive empirical evidence for our claims.

READ FULL TEXT

page 7

page 15

page 20

research
07/22/2023

The instabilities of large learning rate training: a loss landscape view

Modern neural networks are undeniably successful. Numerous works study h...
research
12/07/2020

A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization

Loss landscape analysis is extremely useful for a deeper understanding o...
research
02/19/2018

BDA-PCH: Block-Diagonal Approximation of Positive-Curvature Hessian for Training Neural Networks

We propose a block-diagonal approximation of the positive-curvature Hess...
research
09/03/2013

SKYNET: an efficient and robust neural network training tool for machine learning in astronomy

We present the first public release of our generic neural network traini...
research
10/14/2019

Emergent properties of the local geometry of neural loss landscapes

The local geometry of high dimensional neural network loss landscapes ca...
research
05/16/2023

The Hessian perspective into the Nature of Convolutional Neural Networks

While Convolutional Neural Networks (CNNs) have long been investigated a...
research
12/02/2019

On the Delta Method for Uncertainty Approximation in Deep Learning

The Delta method is a well known procedure used to quantify uncertainty ...

Please sign up or login with your details

Forgot password? Click here to reset