Sharpness-Aware Minimization Leads to Low-Rank Features

by   Maksym Andriushchenko, et al.

Sharpness-aware minimization (SAM) is a recently proposed method that minimizes the sharpness of the training loss of a neural network. While its generalization improvement is well-known and is the primary motivation, we uncover an additional intriguing effect of SAM: reduction of the feature rank which happens at different layers of a neural network. We show that this low-rank effect occurs very broadly: for different architectures such as fully-connected networks, convolutional networks, vision transformers and for different objectives such as regression, classification, language-image contrastive training. To better understand this phenomenon, we provide a mechanistic understanding of how low-rank features arise in a simple two-layer network. We observe that a significant number of activations gets entirely pruned by SAM which directly contributes to the rank reduction. We confirm this effect theoretically and check that it can also occur in deep networks, although the overall rank reduction mechanism can be more complex, especially for deep networks with pre-activation skip connections and self-attention layers. We make our code available at


page 5

page 20


Traned Rank Pruning for Efficient Deep Neural Networks

To accelerate DNNs inference, low-rank approximation has been widely ado...

Rank Diminishing in Deep Neural Networks

The rank of neural networks measures information flowing across layers. ...

Cuttlefish: Low-Rank Model Training without All the Tuning

Recent research has shown that training low-rank neural networks can eff...

Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition

We present a novel global compression framework for deep neural networks...

Exploring Low Rank Training of Deep Neural Networks

Training deep neural networks in low rank, i.e. with factorised layers, ...

FLuRKA: Fast fused Low-Rank Kernel Attention

Many efficient approximate self-attention techniques have become prevale...

Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation

A common technique for compressing a neural network is to compute the k-...

Please sign up or login with your details

Forgot password? Click here to reset