Penalising the biases in norm regularisation enforces sparsity

03/02/2023
by   Etienne Boursier, et al.
0

Controlling the parameters' norm often yields good generalisation when training neural networks. Beyond simple intuitions, the relation between parameters' norm and obtained estimators theoretically remains misunderstood. For one hidden ReLU layer networks with unidimensional data, this work shows the minimal parameters' norm required to represent a function is given by the total variation of its second derivative, weighted by a √(1+x^2) factor. As a comparison, this √(1+x^2) weighting disappears when the norm of the bias terms are ignored. This additional weighting is of crucial importance, since it is shown in this work to enforce uniqueness and sparsity (in number of kinks) of the minimal norm interpolator. On the other hand, omitting the bias' norm allows for non-sparse solutions. Penalising the bias terms in the regularisation, either explicitly or implicitly, thus leads to sparse estimators. This sparsity might take part in the good generalisation of neural networks that is empirically observed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2019

How do infinite width bounded norm networks look in function space?

We consider the question of what functions can be captured by ReLU netwo...
research
10/03/2019

A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case

A key element of understanding the efficacy of overparameterized neural ...
research
12/14/2020

On the Treatment of Optimization Problems with L1 Penalty Terms via Multiobjective Continuation

We present a novel algorithm that allows us to gain detailed insight int...
research
06/05/2023

Does a sparse ReLU network training problem always admit an optimum?

Given a training set, a loss function, and a neural network architecture...
research
06/10/2023

Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs

We prove that, for the fundamental regression task of learning a single ...
research
10/19/2020

Parameter Norm Growth During Training of Transformers

The capacity of neural networks like the widely adopted transformer is k...
research
06/09/2021

Harmless Overparametrization in Two-layer Neural Networks

Overparametrized neural networks, where the number of active parameters ...

Please sign up or login with your details

Forgot password? Click here to reset