On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks

03/31/2023
by   Sebastian Neumayer, et al.
0

In supervised learning, the regularization path is sometimes used as a convenient theoretical proxy for the optimization path of gradient descent initialized with zero. In this paper, we study a modification of the regularization path for infinite-width 2-layer ReLU neural networks with non-zero initial distribution of the weights at different scales. By exploiting a link with unbalanced optimal transport theory, we show that, despite the non-convexity of the 2-layer network training, this problem admits an infinite dimensional convex counterpart. We formulate the corresponding functional optimization problem and investigate its main properties. In particular, we show that as the scale of the initialization ranges between 0 and +∞, the associated path interpolates continuously between the so-called kernel and rich regimes. The numerical experiments confirm that, in our setting, the scaling path and the final states of the optimization path behave similarly even beyond these extreme points.

READ FULL TEXT

page 18

page 19

research
04/14/2022

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

We prove linear convergence of gradient descent to a global minimum for ...
research
10/29/2021

Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit

To theoretically understand the behavior of trained deep neural networks...
research
06/10/2020

All Local Minima are Global for Two-Layer ReLU Neural Networks: The Hidden Convex Optimization Landscape

We are interested in two-layer ReLU neural networks from an optimization...
research
10/18/2021

Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks

Despite several attempts, the fundamental mechanisms behind the success ...
research
02/02/2022

Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions

We develop fast algorithms and robust software for convex optimization o...
research
02/20/2020

Kernel and Rich Regimes in Overparametrized Models

A recent line of work studies overparametrized neural networks in the "k...
research
08/04/2022

Feature selection with gradient descent on two-layer networks in low-rotation regimes

This work establishes low test error of gradient flow (GF) and stochasti...

Please sign up or login with your details

Forgot password? Click here to reset