A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip

06/10/2021
by   Mathieu Even, et al.
0

We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; and a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide continuized Nesterov acceleration under deterministic as well as stochastic gradients, with either additive or multiplicative noise. Finally, using our continuized framework and expressing the gossip averaging problem as the stochastic minimization of a certain energy function, we provide the first rigorous acceleration of asynchronous gossip algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2021

A Continuized View on Nesterov Acceleration

We introduce the "continuized" Nesterov acceleration, a close variant of...
research
07/19/2017

Acceleration and Averaging in Stochastic Mirror Descent Dynamics

We formulate and study a general family of (continuous-time) stochastic ...
research
05/17/2019

A Dynamical Systems Perspective on Nesterov Acceleration

We present a dynamical system framework for understanding Nesterov's acc...
research
05/08/2018

Differential Equations for Modeling Asynchronous Algorithms

Asynchronous stochastic gradient descent (ASGD) is a popular parallel op...
research
10/04/2018

Convergence of the ADAM algorithm from a Dynamical System Viewpoint

Adam is a popular variant of the stochastic gradient descent for finding...
research
06/21/2012

Convergence of the Continuous Time Trajectories of Isotropic Evolution Strategies on Monotonic C^2-composite Functions

The Information-Geometric Optimization (IGO) has been introduced as a un...
research
04/01/2017

Stochastic L-BFGS: Improved Convergence Rates and Practical Acceleration Strategies

We revisit the stochastic limited-memory BFGS (L-BFGS) algorithm. By pro...

Please sign up or login with your details

Forgot password? Click here to reset