Mean-field analysis for heavy ball methods: Dropout-stability, connectivity, and global convergence

10/13/2022
by   Diyuan Wu, et al.
0

The stochastic heavy ball method (SHB), also known as stochastic gradient descent (SGD) with Polyak's momentum, is widely used in training neural networks. However, despite the remarkable success of such algorithm in practice, its theoretical characterization remains limited. In this paper, we focus on neural networks with two and three layers and provide a rigorous understanding of the properties of the solutions found by SHB: (i) stability after dropping out part of the neurons, (ii) connectivity along a low-loss path, and (iii) convergence to the global optimum. To achieve this goal, we take a mean-field view and relate the SHB dynamics to a certain partial differential equation in the limit of large network widths. This mean-field perspective has inspired a recent line of work focusing on SGD while, in contrast, our paper considers an algorithm with momentum. More specifically, after proving existence and uniqueness of the limit differential equations, we show convergence to the global optimum and give a quantitative bound between the mean-field limit and the SHB dynamics of a finite-width network. Armed with this last bound, we are able to establish the dropout-stability and connectivity of SHB solutions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2022

Conservative SPDEs as fluctuating mean field limits of stochastic gradient descent

The convergence of stochastic interacting particle systems in the mean-f...
research
04/18/2018

A Mean Field View of the Landscape of Two-Layers Neural Networks

Multi-layer neural networks are among the most powerful models in machin...
research
08/28/2018

Mean Field Analysis of Neural Networks: A Central Limit Theorem

Machine learning has revolutionized fields such as image, text, and spee...
research
07/14/2020

Global Convergence of Second-order Dynamics in Two-layer Neural Networks

Recent results have shown that for two-layer fully connected neural netw...
research
03/11/2020

A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth

Training deep neural networks with stochastic gradient descent (SGD) can...
research
07/13/2020

Quantitative Propagation of Chaos for SGD in Wide Neural Networks

In this paper, we investigate the limiting behavior of a continuous-time...
research
10/06/2021

On the Global Convergence of Gradient Descent for multi-layer ResNets in the mean-field regime

Finding the optimal configuration of parameters in ResNet is a nonconvex...

Please sign up or login with your details

Forgot password? Click here to reset