Subquadratic Overparameterization for Shallow Neural Networks

11/02/2021
by   ChaeHwan Song, et al.
0

Overparameterization refers to the important phenomenon where the width of a neural network is chosen such that learning algorithms can provably attain zero loss in nonconvex training. The existing theory establishes such global convergence using various initialization strategies, training modifications, and width scalings. In particular, the state-of-the-art results require the width to scale quadratically with the number of training data under standard initialization strategies used in practice for best generalization performance. In contrast, the most recent results obtain linear scaling either with requiring initializations that lead to the "lazy-training", or training only a single layer. In this work, we provide an analytical framework that allows us to adopt standard initialization strategies, possibly avoid lazy training, and train all layers simultaneously in basic shallow neural networks while attaining a desirable subquadratic scaling on the network width. We achieve the desiderata via Polyak-Lojasiewicz condition, smoothness, and standard assumptions on data, and use tools from random matrix theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2019

Effects of Depth, Width, and Initialization: A Convergence Analysis of Layer-wise Training for Deep Linear Neural Networks

Deep neural networks have been used in various machine learning applicat...
research
01/24/2019

Width Provably Matters in Optimization for Deep Linear Neural Networks

We prove that for an L-layer fully-connected linear neural network, if t...
research
09/15/2022

Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)

We study the average robustness notion in deep neural networks in (selec...
research
03/27/2022

On the Neural Tangent Kernel Analysis of Randomly Pruned Wide Neural Networks

We study the behavior of ultra-wide neural networks when their weights a...
research
12/11/2019

Is Feature Diversity Necessary in Neural Network Initialization?

Standard practice in training neural networks involves initializing the ...
research
04/24/2020

Nonconvex penalization for sparse neural networks

Training methods for artificial neural networks often rely on over-param...
research
04/21/2021

Deep limits and cut-off phenomena for neural networks

We consider dynamical and geometrical aspects of deep learning. For many...

Please sign up or login with your details

Forgot password? Click here to reset