Sharpness-Aware Minimization for Efficiently Improving Generalization

10/03/2020
by   Pierre Foret, et al.
2

In today's heavily overparameterized models, the value of the training loss provides few guarantees on model generalization ability. Indeed, optimizing only the training loss value, as is commonly done, can easily lead to suboptimal model quality. Motivated by the connection between geometry of the loss landscape and generalization—including a generalization bound that we prove here—we introduce a novel, effective procedure for instead simultaneously minimizing loss value and loss sharpness. In particular, our procedure, Sharpness-Aware Minimization (SAM), seeks parameters that lie in neighborhoods having uniformly low loss; this formulation results in a min-max optimization problem on which gradient descent can be performed efficiently. We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets (e.g., CIFAR-10, 100, ImageNet, finetuning tasks) and models, yielding novel state-of-the-art performance for several. Additionally, we find that SAM natively provides robustness to label noise on par with that provided by state-of-the-art procedures that specifically target learning with noisy labels.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2023

Gradient constrained sharpness-aware prompt learning for vision-language models

This paper targets a novel trade-off problem in generalizable prompt lea...
research
02/23/2021

ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks

Recently, learning algorithms motivated from sharpness of loss surface a...
research
07/20/2023

Flatness-Aware Minimization for Domain Generalization

Domain generalization (DG) seeks to learn robust models that generalize ...
research
01/16/2023

Stability Analysis of Sharpness-Aware Minimization

Sharpness-aware minimization (SAM) is a recently proposed training metho...
research
10/11/2022

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

Deep neural networks often suffer from poor generalization caused by com...
research
08/07/2023

G-Mix: A Generalized Mixup Learning Framework Towards Flat Minima

Deep neural networks (DNNs) have demonstrated promising results in vario...
research
11/21/2022

Efficient Generalization Improvement Guided by Random Weight Perturbation

To fully uncover the great potential of deep neural networks (DNNs), var...

Please sign up or login with your details

Forgot password? Click here to reset