On the Role of Optimization in Double Descent: A Least Squares Study

07/27/2021
by   Ilja Kuzborskij, et al.
0

Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization. Recently, the double descent phenomena has been proposed to reconcile this observation with theory, suggesting that the test error has a second descent when the model becomes sufficiently overparameterized, as the model size itself acts as an implicit regularizer. In this paper we add to the growing body of work in this space, providing a careful study of learning dynamics as a function of model size for the least squares scenario. We show an excess risk bound for the gradient descent solution of the least squares objective. The bound depends on the smallest non-zero eigenvalue of the covariance matrix of the input features, via a functional form that has the double descent behavior. This gives a new perspective on the double descent curves reported in the literature. Our analysis of the excess risk allows to decouple the effect of optimization and generalization error. In particular, we find that in case of noiseless regression, double descent is explained solely by optimization-related quantities, which was missed in studies focusing on the Moore-Penrose pseudoinverse solution. We believe that our derivation provides an alternative view compared to existing work, shedding some light on a possible cause of this phenomena, at least in the considered least squares setting. We empirically explore if our predictions hold for neural networks, in particular whether the covariance of intermediary hidden activations has a similar behavior as the one predicted by our derivations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2021

Mitigating deep double descent by concatenating inputs

The double descent curve is one of the most intriguing properties of dee...
research
05/31/2022

VC Theoretical Explanation of Double Descent

There has been growing interest in generalization performance of large m...
research
12/13/2022

Gradient flow in the gaussian covariate model: exact solution of learning curves and multiple descent structures

A recent line of work has shown remarkable behaviors of the generalizati...
research
10/21/2021

Conditioning of Random Feature Matrices: Double Descent and Generalization Error

We provide (high probability) bounds on the condition number of random f...
research
06/03/2022

Regularization-wise double descent: Why it occurs and how to eliminate it

The risk of overparameterized models, in particular deep neural networks...
research
03/10/2023

Unifying Grokking and Double Descent

A principled understanding of generalization in deep learning may requir...
research
03/18/2019

Two models of double descent for weak features

The "double descent" risk curve was recently proposed to qualitatively d...

Please sign up or login with your details

Forgot password? Click here to reset