Convergence of stochastic gradient descent under a local Lajasiewicz condition for deep neural networks

by   Jing An, et al.

We extend the global convergence result of Chatterjee <cit.> by considering the stochastic gradient descent (SGD) for non-convex objective functions. With minimal additional assumptions that can be realized by finitely wide neural networks, we prove that if we initialize inside a local region where the Łajasiewicz condition holds, with a positive probability, the stochastic gradient iterates converge to a global minimum inside this region. A key component of our proof is to ensure that the whole trajectories of SGD stay inside the local region with a positive probability. For that, we assume the SGD noise scales with the objective function, which is called machine learning noise and achievable in many real examples. Furthermore, we provide a negative argument to show why using the boundedness of noise with Robbins-Monro type step sizes is not enough to keep the key component valid.


page 1

page 2

page 3

page 4


Convergence rates for the stochastic gradient descent method for non-convex objective functions

We prove the local convergence to minima and estimates on the rate of co...

Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis

Stochastic gradient descent (SGD) is one of the most popular algorithms ...

Global Convergence and Stability of Stochastic Gradient Descent

In machine learning, stochastic gradient descent (SGD) is widely deploye...

An Algebraically Converging Stochastic Gradient Descent Algorithm for Global Optimization

We propose a new stochastic gradient descent algorithm for finding the g...

Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses

Optimal Transport has sparked vivid interest in recent years, in particu...

SGD Through the Lens of Kolmogorov Complexity

We prove that stochastic gradient descent (SGD) finds a solution that ac...

Quantitative W_1 Convergence of Langevin-Like Stochastic Processes with Non-Convex Potential State-Dependent Noise

We prove quantitative convergence rates at which discrete Langevin-like ...

Please sign up or login with your details

Forgot password? Click here to reset