Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

10/11/2022
by   Peng Mi, et al.
0

Deep neural networks often suffer from poor generalization caused by complex and non-convex loss landscapes. One of the popular solutions is Sharpness-Aware Minimization (SAM), which smooths the loss landscape via minimizing the maximized change of training loss when adding a perturbation to the weight. However, we find the indiscriminate perturbation of SAM on all parameters is suboptimal, which also results in excessive computation, i.e., double the overhead of common optimizers like Stochastic Gradient Descent (SGD). In this paper, we propose an efficient and effective training scheme coined as Sparse SAM (SSAM), which achieves sparse perturbation by a binary mask. To obtain the sparse mask, we provide two solutions which are based onFisher information and dynamic sparse training, respectively. In addition, we theoretically prove that SSAM can converge at the same rate as SAM, i.e., O(log T/√(T)). Sparse SAM not only has the potential for training acceleration but also smooths the loss landscape effectively. Extensive experimental results on CIFAR10, CIFAR100, and ImageNet-1K confirm the superior efficiency of our method to SAM, and the performance is preserved or even better with a perturbation of merely 50 https://github.com/Mi-Peng/Sparse-Sharpness-Aware-Minimization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2023

Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer

Deep neural networks often suffer from poor generalization due to comple...
research
10/07/2021

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Overparametrized Deep Neural Networks (DNNs) often achieve astounding pe...
research
12/16/2021

δ-SAM: Sharpness-Aware Minimization with Dynamic Reweighting

Deep neural networks are often overparameterized and may not easily achi...
research
10/11/2022

Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models

Fine-tuning large pretrained language models on a limited training corpu...
research
02/13/2023

Bi-directional Masks for Efficient N:M Sparse Training

We focus on addressing the dense backward propagation issue for training...
research
10/03/2020

Sharpness-Aware Minimization for Efficiently Improving Generalization

In today's heavily overparameterized models, the value of the training l...
research
11/21/2022

Efficient Generalization Improvement Guided by Random Weight Perturbation

To fully uncover the great potential of deep neural networks (DNNs), var...

Please sign up or login with your details

Forgot password? Click here to reset