Harmless Overparametrization in Two-layer Neural Networks

06/09/2021
by   Huiyuan Wang, et al.
0

Overparametrized neural networks, where the number of active parameters is larger than the sample size, prove remarkably effective in modern deep learning practice. From the classical perspective, however, much fewer parameters are sufficient for optimal estimation and prediction, whereas overparametrization can be harmful even in the presence of explicit regularization. To reconcile this conflict, we present a generalization theory for overparametrized ReLU networks by incorporating an explicit regularizer based on the scaled variation norm. Interestingly, this regularizer is equivalent to the ridge from the angle of gradient-based optimization, but is similar to the group lasso in terms of controlling model complexity. By exploiting this ridge-lasso duality, we show that overparametrization is generally harmless to two-layer ReLU networks. In particular, the overparametrized estimators are minimax optimal up to a logarithmic factor. By contrast, we show that overparametrized random feature models suffer from the curse of dimensionality and thus are suboptimal.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2020

Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Overparametrized neural networks trained by gradient descent (GD) can pr...
research
06/14/2023

Nonparametric regression using over-parameterized shallow ReLU neural networks

It is shown that over-parameterized neural networks can achieve minimax ...
research
09/18/2021

Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks

We study the problem of estimating an unknown function from noisy data u...
research
05/09/2023

A duality framework for generalization analysis of random feature models and two-layer neural networks

We consider the problem of learning functions in the ℱ_p,π and Barron sp...
research
10/07/2021

On the Optimal Memorization Power of ReLU Neural Networks

We study the memorization power of feedforward ReLU neural networks. We ...
research
03/02/2023

Penalising the biases in norm regularisation enforces sparsity

Controlling the parameters' norm often yields good generalisation when t...

Please sign up or login with your details

Forgot password? Click here to reset