Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with Linear Convergence Rates

05/19/2022
by   Jingwei Zhang, et al.
10

We consider optimizing two-layer neural networks in the mean-field regime where the learning dynamics of network weights can be approximated by the evolution in the space of probability measures over the weight parameters associated with the neurons. The mean-field regime is a theoretically attractive alternative to the NTK (lazy training) regime which is only restricted locally in the so-called neural tangent kernel space around specialized initializations. Several prior works (<cit.>) establish the asymptotic global optimality of the mean-field regime, but it is still challenging to obtain a quantitative convergence rate due to the complicated nonlinearity of the training dynamics. This work establishes a new linear convergence result for two-layer neural networks trained by continuous-time noisy gradient descent in the mean-field regime. Our result relies on a novelty logarithmic Sobolev inequality for two-layer neural networks, and uniform upper bounds on the logarithmic Sobolev constants for a family of measures determined by the evolving distribution of hidden neurons.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2023

Global Optimality of Elman-type RNN in the Mean-Field Regime

We analyze Elman-type Recurrent Reural Networks (RNNs) and their trainin...
research
06/20/2023

Mean-field Analysis of Generalization Errors

We propose a novel framework for exploring weak and L_2 generalization e...
research
06/18/2020

On Sparsity in Overparametrised Shallow ReLU Networks

The analysis of neural network training beyond their linearization regim...
research
04/19/2023

Leveraging the two timescale regime to demonstrate convergence of neural networks

We study the training dynamics of shallow neural networks, in a two-time...
research
02/05/2020

A mean-field theory of lazy training in two-layer neural nets: entropic regularization and controlled McKean-Vlasov dynamics

We consider the problem of universal approximation of functions by two-l...
research
07/13/2020

Quantitative Propagation of Chaos for SGD in Wide Neural Networks

In this paper, we investigate the limiting behavior of a continuous-time...
research
06/25/2020

The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

A numerical and phenomenological study of the gradient descent (GD) algo...

Please sign up or login with your details

Forgot password? Click here to reset