Sharpness-Aware Minimization Leads to Low-Rank Features

05/25/2023
by   Maksym Andriushchenko, et al.
1

Sharpness-aware minimization (SAM) is a recently proposed method that minimizes the sharpness of the training loss of a neural network. While its generalization improvement is well-known and is the primary motivation, we uncover an additional intriguing effect of SAM: reduction of the feature rank which happens at different layers of a neural network. We show that this low-rank effect occurs very broadly: for different architectures such as fully-connected networks, convolutional networks, vision transformers and for different objectives such as regression, classification, language-image contrastive training. To better understand this phenomenon, we provide a mechanistic understanding of how low-rank features arise in a simple two-layer network. We observe that a significant number of activations gets entirely pruned by SAM which directly contributes to the rank reduction. We confirm this effect theoretically and check that it can also occur in deep networks, although the overall rank reduction mechanism can be more complex, especially for deep networks with pre-activation skip connections and self-attention layers. We make our code available at https://github.com/tml-epfl/sam-low-rank-features.

READ FULL TEXT

page 5

page 20

research
10/09/2019

Traned Rank Pruning for Efficient Deep Neural Networks

To accelerate DNNs inference, low-rank approximation has been widely ado...
research
06/13/2022

Rank Diminishing in Deep Neural Networks

The rank of neural networks measures information flowing across layers. ...
research
05/04/2023

Cuttlefish: Low-Rank Model Training without All the Tuning

Recent research has shown that training low-rank neural networks can eff...
research
07/23/2021

Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition

We present a novel global compression framework for deep neural networks...
research
09/27/2022

Exploring Low Rank Training of Deep Neural Networks

Training deep neural networks in low rank, i.e. with factorised layers, ...
research
06/27/2023

FLuRKA: Fast fused Low-Rank Kernel Attention

Many efficient approximate self-attention techniques have become prevale...
research
09/11/2020

Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation

A common technique for compressing a neural network is to compute the k-...

Please sign up or login with your details

Forgot password? Click here to reset