On the Convergence of Shallow Neural Network Training with Randomly Masked Neurons

12/05/2021
by   Fangshuo Liao, et al.
0

Given a dense shallow neural network, we focus on iteratively creating, training, and combining randomly selected subnetworks (surrogate functions), towards training the full model. By carefully analyzing i) the subnetworks' neural tangent kernel, ii) the surrogate functions' gradient, and iii) how we sample and combine the surrogate functions, we prove linear convergence rate of the training error – within an error region – for an overparameterized single-hidden layer perceptron with ReLU activations for a regression task. Our result implies that, for fixed neuron selection probability, the error term decreases as we increase the number of surrogate models, and increases as we increase the number of local training steps for each selected subnetwork. The considered framework generalizes and provides new insights on dropout training, multi-sample dropout training, as well as Independent Subnet Training; for each case, we provide corresponding convergence results, as corollaries of our main theorem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

On Convergence and Generalization of Dropout Training

We study dropout in two-layer neural networks with rectified linear unit...
research
09/21/2019

ASNI: Adaptive Structured Noise Injection for shallow and deep neural networks

Dropout is a regularisation technique in neural network training where u...
research
12/01/2020

Asymptotic convergence rate of Dropout on shallow linear neural networks

We analyze the convergence rate of gradient flows on objective functions...
research
05/15/2019

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

In this work, we propose a novel technique to boost training efficiency ...
research
02/20/2023

Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron

We revisit the problem of learning a single neuron with ReLU activation ...
research
03/06/2015

To Drop or Not to Drop: Robustness, Consistency and Differential Privacy Properties of Dropout

Training deep belief networks (DBNs) requires optimizing a non-convex fu...
research
12/17/2021

On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

In this article we study fully-connected feedforward deep ReLU ANNs with...

Please sign up or login with your details

Forgot password? Click here to reset