Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

02/03/2019
by   Yunwen Lei, et al.
0

Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing theoretical results for SGD applied to nonconvex objective functions are far from mature. For example, existing results require to impose a nontrivial assumption on the uniform boundedness of gradients for all iterates encountered in the learning process, which is hard to verify in practical implementations. In this paper, we establish a rigorous theoretical foundation for SGD in nonconvex learning by showing that this boundedness assumption can be removed without affecting convergence rates. In particular, we establish sufficient conditions for almost sure convergence as well as optimal convergence rates for SGD applied to both general nonconvex objective functions and gradient-dominated objective functions. A linear convergence is further derived in the case with zero variances.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2019

Convergence rates for the stochastic gradient descent method for non-convex objective functions

We prove the local convergence to minima and estimates on the rate of co...
research
10/09/2018

Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD

We study Stochastic Gradient Descent (SGD) with diminishing step sizes f...
research
11/09/2021

Learning Rates for Nonconvex Pairwise Learning

Pairwise learning is receiving increasing attention since it covers many...
research
06/26/2023

Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning

The widely used stochastic gradient methods for minimizing nonconvex com...
research
06/16/2023

Gradient is All You Need?

In this paper we provide a novel analytical perspective on the theoretic...
research
06/06/2023

Understanding Progressive Training Through the Framework of Randomized Coordinate Descent

We propose a Randomized Progressive Training algorithm (RPT) – a stochas...
research
04/15/2020

On Learning Rates and Schrödinger Operators

The learning rate is perhaps the single most important parameter in the ...

Please sign up or login with your details

Forgot password? Click here to reset