STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization

11/01/2021
by   Kfir Y. Levy, et al.
0

In this work we investigate stochastic non-convex optimization problems where the objective is an expectation over smooth loss functions, and the goal is to find an approximate stationary point. The most popular approach to handling such problems is variance reduction techniques, which are also known to obtain tight convergence rates, matching the lower bounds in this case. Nevertheless, these techniques require a careful maintenance of anchor points in conjunction with appropriately selected "mega-batchsizes". This leads to a challenging hyperparameter tuning problem, that weakens their practicality. Recently, [Cutkosky and Orabona, 2019] have shown that one can employ recursive momentum in order to avoid the use of anchor points and large batchsizes, and still obtain the optimal rate for this setting. Yet, their method called STORM crucially relies on the knowledge of the smoothness, as well a bound on the gradient norms. In this work we propose STORM+, a new method that is completely parameter-free, does not require large batch-sizes, and obtains the optimal O(1/T^1/3) rate for finding an approximate stationary point. Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2019

Momentum-Based Variance Reduction in Non-Convex SGD

Variance reduction has emerged in recent years as a strong competitor to...
research
04/09/2023

μ^2-SGD: Stable Stochastic Optimization via a Double Momentum Mechanism

We consider stochastic convex optimization problems where the objective ...
research
08/11/2020

Riemannian stochastic recursive momentum method for non-convex optimization

We propose a stochastic recursive momentum method for Riemannian non-con...
research
08/10/2018

Weighted AdaGrad with Unified Momentum

Integrating adaptive learning rate and momentum techniques into SGD lead...
research
11/03/2022

Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization

We propose an adaptive variance-reduction method, called AdaSpider, for ...
research
02/09/2020

Momentum Improves Normalized SGD

We provide an improved analysis of normalized SGD showing that adding mo...
research
11/04/2020

Reverse engineering learned optimizers reveals known and novel mechanisms

Learned optimizers are algorithms that can themselves be trained to solv...

Please sign up or login with your details

Forgot password? Click here to reset