The Inductive Bias of Flatness Regularization for Deep Matrix Factorization

06/22/2023
by   Khashayar Gatmiry, et al.
0

Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-loss solutions. More explicit forms of flatness regularization also empirically improve the generalization performance. However, it remains unclear why and when flatness regularization leads to better generalization. This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in an important setting: learning deep linear networks from linear measurements, also known as deep matrix factorization. We show that for all depth greater than one, with the standard Restricted Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters (i.e., the product of all layer matrices), which in turn leads to better generalization. We empirically verify our theoretical findings on synthetic datasets.

READ FULL TEXT
research
06/01/2023

Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks

Works on implicit regularization have studied gradient trajectories duri...
research
12/07/2020

A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization

Loss landscape analysis is extremely useful for a deeper understanding o...
research
11/04/2022

Spectral Regularization: an Inductive Bias for Sequence Modeling

Various forms of regularization in learning tasks strive for different n...
research
08/11/2022

Regularizing Deep Neural Networks with Stochastic Estimators of Hessian Trace

In this paper we develop a novel regularization method for deep neural n...
research
12/26/2017

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations

We show that the (stochastic) gradient descent algorithm provides an imp...
research
12/20/2014

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

We present experiments demonstrating that some other form of capacity co...
research
06/18/2019

Information matrices and generalization

This work revisits the use of information criteria to characterize the g...

Please sign up or login with your details

Forgot password? Click here to reset