First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time

by   Yi Xu, et al.

Two classes of methods have been proposed for escaping from saddle points with one using the second-order information carried by the Hessian and the other adding the noise into the first-order information. The existing analysis for algorithms using noise in the first-order information is quite involved and hides the essence of added noise, which hinder further improvements of these algorithms. In this paper, we present a novel perspective of noise-adding technique, i.e., adding the noise into the first-order information can help extract the negative curvature from the Hessian matrix, and provide a formal reasoning of this perspective by analyzing a simple first-order procedure. More importantly, the proposed procedure enables one to design purely first-order stochastic algorithms for escaping from non-degenerate saddle points with a much better time complexity (almost linear time in terms of the problem's dimensionality). In particular, we develop a first-order stochastic algorithm based on our new technique and an existing algorithm that only converges to a first-order stationary point to enjoy a time complexity of O(d/ϵ^3.5) for finding a nearly second-order stationary point x such that ∇ F(bfx)≤ϵ and ∇^2 F(bfx)≥ -√(ϵ)I (in high probability), where F(·) denotes the objective function and d is the dimensionality of the problem. To the best of our knowledge, this is the best theoretical result of first-order algorithms for stochastic non-convex optimization, which is even competitive with if not better than existing stochastic algorithms hinging on the second-order information.


page 1

page 2

page 3

page 4


Stochastic Non-convex Optimization with Strong High Probability Second-order Convergence

In this paper, we study stochastic non-convex optimization with non-conv...

On Noisy Negative Curvature Descent: Competing with Gradient Descent for Faster Non-convex Optimization

The Hessian-vector product has been utilized to find a second-order stat...

On the Second-order Convergence Properties of Random Search Methods

We study the theoretical convergence properties of random-search methods...

Implementation of Stochastic Quasi-Newton's Method in PyTorch

In this paper, we implement the Stochastic Damped LBFGS (SdLBFGS) for st...

Curvature-Exploiting Acceleration of Elastic Net Computations

This paper introduces an efficient second-order method for solving the e...

The Hypervolume Indicator Hessian Matrix: Analytical Expression, Computational Time Complexity, and Sparsity

The problem of approximating the Pareto front of a multiobjective optimi...

UAdam: Unified Adam-Type Algorithmic Framework for Non-Convex Stochastic Optimization

Adam-type algorithms have become a preferred choice for optimisation in ...

Please sign up or login with your details

Forgot password? Click here to reset