Bounding the Width of Neural Networks via Coupled Initialization – A Worst Case Analysis

06/26/2022
by   Alexander Munteanu, et al.
0

A common method in training neural networks is to initialize all the weights to be independent Gaussian vectors. We observe that by instead initializing the weights into independent pairs, where each pair consists of two identical Gaussian vectors, we can significantly improve the convergence analysis. While a similar technique has been studied for random inputs [Daniely, NeurIPS 2020], it has not been analyzed with arbitrary inputs. Using this technique, we show how to significantly reduce the number of neurons required for two-layer ReLU networks, both in the under-parameterized setting with logistic loss, from roughly γ^-8 [Ji and Telgarsky, ICLR 2020] to γ^-2, where γ denotes the separation margin with a Neural Tangent Kernel, as well as in the over-parameterized setting with squared loss, from roughly n^4 [Song and Yang, 2019] to n^2, implicitly also improving the recent running time bound of [Brand, Peng, Song and Weinstein, ITCS 2021]. For the under-parameterized setting we also prove new lower bounds that improve upon prior work, and that under certain assumptions, are best possible.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2023

Nonparametric regression using over-parameterized shallow ReLU neural networks

It is shown that over-parameterized neural networks can achieve minimax ...
research
06/05/2022

Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks

The convergence of GD and SGD when training mildly parameterized neural ...
research
11/27/2020

Tight Hardness Results for Training Depth-2 ReLU Networks

We prove several hardness results for training depth-2 neural networks w...
research
07/23/2019

Trainability and Data-dependent Initialization of Over-parameterized ReLU Neural Networks

A neural network is said to be over-specified if its representational po...
research
12/21/2020

Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks

A recent line of work has analyzed the theoretical properties of deep ne...
research
08/04/2022

Feature selection with gradient descent on two-layer networks in low-rotation regimes

This work establishes low test error of gradient flow (GF) and stochasti...
research
07/31/2021

The Separation Capacity of Random Neural Networks

Neural networks with random weights appear in a variety of machine learn...

Please sign up or login with your details

Forgot password? Click here to reset