Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods

by   Nicolas Loizou, et al.

In this paper we study several classes of stochastic optimization algorithms enriched with heavy ball momentum. Among the methods studied are: stochastic gradient descent, stochastic Newton, stochastic proximal point and stochastic dual subspace ascent. This is the first time momentum variants of several of these methods are studied. We choose to perform our analysis in a setting in which all of the above methods are equivalent. We prove global nonassymptotic linear convergence rates for all methods and various measures of success, including primal function values, primal iterates (in L2 sense), and dual function values. We also show that the primal iterates converge at an accelerated linear rate in the L1 sense. This is the first time a linear rate is shown for the stochastic heavy ball method (i.e., stochastic gradient descent method with momentum). Under somewhat weaker conditions, we establish a sublinear convergence rate for Cesaro averages of primal iterates. Moreover, we propose a novel concept, which we call stochastic momentum, aimed at decreasing the cost of performing the momentum step. We prove linear convergence of several stochastic methods with stochastic momentum, and show that in some sparse data regimes and for sufficiently small momentum parameters, these methods enjoy better overall complexity than methods with deterministic momentum. Finally, we perform extensive numerical testing on artificial and real datasets, including data coming from average consensus problems.


page 1

page 2

page 3

page 4


Understanding the Role of Momentum in Stochastic Gradient Methods

The use of momentum in stochastic gradient methods has become a widespre...

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

Stochastic gradient descent with momentum (SGDM) is the dominant algorit...

Stochastic Reformulations of Linear Systems: Algorithms and Convergence Theory

We develop a family of reformulations of an arbitrary consistent linear ...

On the convergence of the Stochastic Heavy Ball Method

We provide a comprehensive analysis of the Stochastic Heavy Ball (SHB) m...

Fast stochastic dual coordinate descent algorithms for linearly constrained convex optimization

The problem of finding a solution to the linear system Ax = b with certa...

Discriminative Bayesian Filtering Lends Momentum to the Stochastic Newton Method for Minimizing Log-Convex Functions

To minimize the average of a set of log-convex functions, the stochastic...

Zap Meets Momentum: Stochastic Approximation Algorithms with Optimal Convergence Rate

There are two well known Stochastic Approximation techniques that are kn...

Please sign up or login with your details

Forgot password? Click here to reset