Convergence of stochastic gradient descent under a local Lajasiewicz condition for deep neural networks

04/18/2023
by   Jing An, et al.
0

We extend the global convergence result of Chatterjee <cit.> by considering the stochastic gradient descent (SGD) for non-convex objective functions. With minimal additional assumptions that can be realized by finitely wide neural networks, we prove that if we initialize inside a local region where the Łajasiewicz condition holds, with a positive probability, the stochastic gradient iterates converge to a global minimum inside this region. A key component of our proof is to ensure that the whole trajectories of SGD stay inside the local region with a positive probability. For that, we assume the SGD noise scales with the objective function, which is called machine learning noise and achievable in many real examples. Furthermore, we provide a negative argument to show why using the boundedness of noise with Robbins-Monro type step sizes is not enough to keep the key component valid.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2019

Convergence rates for the stochastic gradient descent method for non-convex objective functions

We prove the local convergence to minima and estimates on the rate of co...
research
05/04/2021

Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis

Stochastic gradient descent (SGD) is one of the most popular algorithms ...
research
10/04/2021

Global Convergence and Stability of Stochastic Gradient Descent

In machine learning, stochastic gradient descent (SGD) is widely deploye...
research
04/12/2022

An Algebraically Converging Stochastic Gradient Descent Algorithm for Global Optimization

We propose a new stochastic gradient descent algorithm for finding the g...
research
07/21/2023

Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses

Optimal Transport has sparked vivid interest in recent years, in particu...
research
11/10/2021

SGD Through the Lens of Kolmogorov Complexity

We prove that stochastic gradient descent (SGD) finds a solution that ac...
research
07/07/2019

Quantitative W_1 Convergence of Langevin-Like Stochastic Processes with Non-Convex Potential State-Dependent Noise

We prove quantitative convergence rates at which discrete Langevin-like ...

Please sign up or login with your details

Forgot password? Click here to reset