Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

08/03/2018
by   Yuanzhi Li, et al.
0

Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.

READ FULL TEXT
research
01/04/2021

Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise

We consider a one-hidden-layer leaky ReLU network of arbitrary width tra...
research
11/12/2018

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

Neural networks have great success in many machine learning applications...
research
04/14/2022

RankNEAT: Outperforming Stochastic Gradient Search in Preference Learning Tasks

Stochastic gradient descent (SGD) is a premium optimization method for t...
research
06/19/2021

Learning and Generalization in Overparameterized Normalizing Flows

In supervised learning, it is known that overparameterized neural networ...
research
02/22/2020

On the Inductive Bias of a CNN for Orthogonal Patterns Distributions

Training overparameterized convolutional neural networks with gradient b...
research
07/17/2019

On the geometry of solutions and on the capacity of multi-layer neural networks with ReLU activations

Rectified Linear Units (ReLU) have become the main model for the neural ...
research
05/28/2019

SGD on Neural Networks Learns Functions of Increasing Complexity

We perform an experimental study of the dynamics of Stochastic Gradient ...

Please sign up or login with your details

Forgot password? Click here to reset