Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer

06/30/2023
by   Peng Mi, et al.
0

Deep neural networks often suffer from poor generalization due to complex and non-convex loss landscapes. Sharpness-Aware Minimization (SAM) is a popular solution that smooths the loss landscape by minimizing the maximized change of training loss when adding a perturbation to the weight. However, indiscriminate perturbation of SAM on all parameters is suboptimal and results in excessive computation, double the overhead of common optimizers like Stochastic Gradient Descent (SGD). In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves sparse perturbation by a binary mask. To obtain the sparse mask, we provide two solutions based on Fisher information and dynamic sparse training, respectively. We investigate the impact of different masks, including unstructured, structured, and N:M structured patterns, as well as explicit and implicit forms of implementing sparse perturbation. We theoretically prove that SSAM can converge at the same rate as SAM, i.e., O(log T/√(T)). Sparse SAM has the potential to accelerate training and smooth the loss landscape effectively. Extensive experimental results on CIFAR and ImageNet-1K confirm that our method is superior to SAM in terms of efficiency, and the performance is preserved or even improved with a perturbation of merely 50% sparsity. Code is available at https://github.com/Mi-Peng/Systematic-Investigation-of-Sparse-Perturbed-Sharpness-Aware-Minimization-Optimizer.

READ FULL TEXT

page 4

page 8

research
10/11/2022

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

Deep neural networks often suffer from poor generalization caused by com...
research
10/07/2021

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Overparametrized Deep Neural Networks (DNNs) often achieve astounding pe...
research
01/30/2022

Optimizing Gradient-driven Criteria in Network Sparsity: Gradient is All You Need

Network sparsity receives popularity mostly due to its capability to red...
research
10/11/2022

Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models

Fine-tuning large pretrained language models on a limited training corpu...
research
05/27/2022

Sharpness-Aware Training for Free

Modern deep neural networks (DNNs) have achieved state-of-the-art perfor...
research
02/13/2023

Bi-directional Masks for Efficient N:M Sparse Training

We focus on addressing the dense backward propagation issue for training...
research
06/16/2023

Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima

Sharpness-Aware Minimization (SAM) is an optimizer that takes a descent ...

Please sign up or login with your details

Forgot password? Click here to reset