The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima

10/04/2022
by   Peter L. Bartlett, et al.
0

We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems. We show that when SAM is applied with a convex quadratic objective, for most random initializations it converges to a cycle that oscillates between either side of the minimum in the direction with the largest curvature, and we provide bounds on the rate of convergence. In the non-quadratic case, we show that such oscillations effectively perform gradient descent, with a smaller step-size, on the spectral norm of the Hessian. In such cases, SAM's update may be regarded as a third derivative – the derivative of the Hessian in the leading eigenvector direction – that encourages drift toward wider minima.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2023

Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization

Recently, flat minima are proven to be effective for improving generaliz...
research
12/30/2017

Theory of Deep Learning III: explaining the non-overfitting puzzle

A main puzzle of deep networks revolves around the absence of overfittin...
research
04/06/2021

A Caputo fractional derivative-based algorithm for optimization

We propose a novel Caputo fractional derivative-based optimization algor...
research
08/06/2023

The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning

In this work, we investigate the dynamics of stochastic gradient descent...
research
09/14/2017

The Impact of Local Geometry and Batch Size on the Convergence and Divergence of Stochastic Gradient Descent

Stochastic small-batch (SB) methods, such as mini-batch Stochastic Gradi...
research
06/12/2020

Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval

Despite the widespread use of gradient-based algorithms for optimizing h...
research
09/25/2018

Hessian barrier algorithms for linearly constrained optimization problems

In this paper, we propose an interior-point method for linearly constrai...

Please sign up or login with your details

Forgot password? Click here to reset