Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks

11/03/2021
by   Alexander Shevchenko, et al.
0

Understanding the properties of neural networks trained via stochastic gradient descent (SGD) is at the heart of the theory of deep learning. In this work, we take a mean-field view, and consider a two-layer ReLU network trained via SGD for a univariate regularized regression problem. Our main result is that SGD is biased towards a simple solution: at convergence, the ReLU network implements a piecewise linear map of the inputs, and the number of "knot" points - i.e., points where the tangent of the ReLU network estimator changes - between two consecutive training inputs is at most three. In particular, as the number of neurons of the network grows, the SGD dynamics is captured by the solution of a gradient flow and, at convergence, the distribution of the weights approaches the unique minimizer of a related free energy, which has a Gibbs form. Our key technical contribution consists in the analysis of the estimator resulting from this minimizer: we show that its second derivative vanishes everywhere, except at some specific locations which represent the "knot" points. We also provide empirical evidence that knots at locations distinct from the data points might occur, as predicted by our theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2020

On the Convergence of Gradient Descent Training for Two-layer ReLU-networks in the Mean Field Regime

We describe a necessary and sufficient condition for the convergence to ...
research
10/12/2022

Rigorous dynamical mean field theory for stochastic gradient descent methods

We prove closed-form equations for the exact high-dimensional asymptotic...
research
04/18/2018

A Mean Field View of the Landscape of Two-Layers Neural Networks

Multi-layer neural networks are among the most powerful models in machin...
research
07/13/2020

Quantitative Propagation of Chaos for SGD in Wide Neural Networks

In this paper, we investigate the limiting behavior of a continuous-time...
research
06/11/2020

Directional convergence and alignment in deep learning

In this paper, we show that although the minimizers of cross-entropy and...
research
07/17/2019

On the geometry of solutions and on the capacity of multi-layer neural networks with ReLU activations

Rectified Linear Units (ReLU) have become the main model for the neural ...
research
02/05/2020

A mean-field theory of lazy training in two-layer neural nets: entropic regularization and controlled McKean-Vlasov dynamics

We consider the problem of universal approximation of functions by two-l...

Please sign up or login with your details

Forgot password? Click here to reset