The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima

by   Peter L. Bartlett, et al.

We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems. We show that when SAM is applied with a convex quadratic objective, for most random initializations it converges to a cycle that oscillates between either side of the minimum in the direction with the largest curvature, and we provide bounds on the rate of convergence. In the non-quadratic case, we show that such oscillations effectively perform gradient descent, with a smaller step-size, on the spectral norm of the Hessian. In such cases, SAM's update may be regarded as a third derivative – the derivative of the Hessian in the leading eigenvector direction – that encourages drift toward wider minima.


page 1

page 2

page 3

page 4


Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization

Recently, flat minima are proven to be effective for improving generaliz...

Theory of Deep Learning III: explaining the non-overfitting puzzle

A main puzzle of deep networks revolves around the absence of overfittin...

A Caputo fractional derivative-based algorithm for optimization

We propose a novel Caputo fractional derivative-based optimization algor...

The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning

In this work, we investigate the dynamics of stochastic gradient descent...

The Impact of Local Geometry and Batch Size on the Convergence and Divergence of Stochastic Gradient Descent

Stochastic small-batch (SB) methods, such as mini-batch Stochastic Gradi...

Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval

Despite the widespread use of gradient-based algorithms for optimizing h...

Hessian barrier algorithms for linearly constrained optimization problems

In this paper, we propose an interior-point method for linearly constrai...

Please sign up or login with your details

Forgot password? Click here to reset