Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances

by   Bugra Can, et al.

Momentum methods such as Polyak's heavy ball (HB) method, Nesterov's accelerated gradient (AG) as well as accelerated projected gradient (APG) method have been commonly used in machine learning practice, but their performance is quite sensitive to noise in the gradients. We study these methods under a first-order stochastic oracle model where noisy estimates of the gradients are available. For strongly convex problems, we show that the distribution of the iterates of AG converges with the accelerated O(√(κ)(1/ε)) linear rate to a ball of radius ε centered at a unique invariant distribution in the 1-Wasserstein metric where κ is the condition number as long as the noise variance is smaller than an explicit upper bound we can provide. Our analysis also certifies linear convergence rates as a function of the stepsize, momentum parameter and the noise variance; recovering the accelerated rates in the noiseless case and quantifying the level of noise that can be tolerated to achieve a given performance. In the special case of strongly convex quadratic objectives, we can show accelerated linear rates in the p-Wasserstein metric for any p≥ 1 with improved sensitivity to noise for both AG and HB through a non-asymptotic analysis under some additional assumptions on the noise structure. Our analysis for HB and AG also leads to improved non-asymptotic convergence bounds in suboptimality for both deterministic and stochastic settings which is of independent interest. To the best of our knowledge, these are the first linear convergence results for stochastic momentum methods under the stochastic oracle model. We also extend our results to the APG method and weakly convex functions showing accelerated rates when the noise magnitude is sufficiently small.


ASVRG: Accelerated Proximal SVRG

This paper proposes an accelerated proximal stochastic variance reduced ...

Tradeoffs between convergence rate and noise amplification for momentum-based accelerated optimization algorithms

We study momentum-based first-order optimization algorithms in which the...

Parametrized Accelerated Methods Free of Condition Number

Analyses of accelerated (momentum-based) gradient descent usually assume...

Fast Stochastic Composite Minimization and an Accelerated Frank-Wolfe Algorithm under Parallelization

We consider the problem of minimizing the sum of two convex functions. O...

Complexity Guarantees for Polyak Steps with Momentum

In smooth strongly convex optimization, or in the presence of Hölderian ...

Minimal error momentum Bregman-Kaczmarz

The Bregman-Kaczmarz method is an iterative method which can solve stron...

On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings

We study Nesterov's accelerated gradient method in the stochastic approx...

Please sign up or login with your details

Forgot password? Click here to reset