A modern look at the relationship between sharpness and generalization

02/14/2023
by   Maksym Andriushchenko, et al.
0

Sharpness of minima is a promising quantity that can correlate with generalization in deep networks and, when optimized during training, can improve generalization. However, standard sharpness is not invariant under reparametrizations of neural networks, and, to fix this, reparametrization-invariant sharpness definitions have been proposed, most prominently adaptive sharpness (Kwon et al., 2021). But does it really capture generalization in modern practical settings? We comprehensively explore this question in a detailed study of various definitions of adaptive sharpness in settings ranging from training from scratch on ImageNet and CIFAR-10 to fine-tuning CLIP on ImageNet and BERT on MNLI. We focus mostly on transformers for which little is known in terms of sharpness despite their widespread usage. Overall, we observe that sharpness does not correlate well with generalization but rather with some training parameters like the learning rate that can be positively or negatively correlated with generalization depending on the setup. Interestingly, in multiple cases, we observe a consistent negative correlation of sharpness with out-of-distribution error implying that sharper minima can generalize better. Finally, we illustrate on a simple model that the right sharpness measure is highly data-dependent, and that we do not understand well this aspect for realistic data distributions. The code of our experiments is available at https://github.com/tml-epfl/sharpness-vs-generalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2022

Towards Understanding Sharpness-Aware Minimization

Sharpness-Aware Minimization (SAM) is a recent training method that reli...
research
01/15/2019

Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis

The notion of flat minima has played a key role in the generalization pr...
research
01/08/2021

BN-invariant sharpness regularizes the training model to better generalization

It is arguably believed that flatter minima can generalize better. Howev...
research
03/09/2020

Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

While the generalization properties of neural networks are not yet well ...
research
06/15/2020

Post-Hoc Methods for Debiasing Neural Networks

As deep learning models become tasked with more and more decisions that ...
research
03/06/2019

Positively Scale-Invariant Flatness of ReLU Neural Networks

It was empirically confirmed by Keskar et al.SharpMinima that flatter mi...
research
04/20/2022

Does Interference Exist When Training a Once-For-All Network?

The Once-For-All (OFA) method offers an excellent pathway to deploy a tr...

Please sign up or login with your details

Forgot password? Click here to reset