META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions

09/29/2022
by   Zijian Liu, et al.
0

We study the application of variance reduction (VR) techniques to general non-convex stochastic optimization problems. In this setting, the recent work STORM [Cutkosky-Orabona '19] overcomes the drawback of having to compute gradients of "mega-batches" that earlier VR methods rely on. There, STORM utilizes recursive momentum to achieve the VR effect and is then later made fully adaptive in STORM+ [Levy et al., '21], where full-adaptivity removes the requirement for obtaining certain problem-specific parameters such as the smoothness of the objective and bounds on the variance and norm of the stochastic gradients in order to set the step size. However, STORM+ crucially relies on the assumption that the function values are bounded, excluding a large class of useful functions. In this work, we propose META-STORM, a generalized framework of STORM+ that removes this bounded function values assumption while still attaining the optimal convergence rate for non-convex optimization. META-STORM not only maintains full-adaptivity, removing the need to obtain problem specific parameters, but also improves the convergence rate's dependency on the problem parameters. Furthermore, META-STORM can utilize a large range of parameter settings that subsumes previous methods allowing for more flexibility in a wider range of settings. Finally, we demonstrate the effectiveness of META-STORM through experiments across common deep learning tasks. Our algorithm improves upon the previous work STORM+ and is competitive with widely used algorithms after the addition of per-coordinate update and exponential moving average heuristics.

READ FULL TEXT

page 12

page 13

page 15

page 18

research
01/28/2022

Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction

In this paper, we study the finite-sum convex optimization problem focus...
research
02/28/2023

High Probability Convergence of Stochastic Gradient Methods

In this work, we describe a generic approach to show convergence with hi...
research
02/26/2018

VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

In this paper, we propose a simple variant of the original SVRG, called ...
research
02/11/2022

The Power of Adaptivity in SGD: Self-Tuning Step Sizes with Unbounded Gradients and Affine Variance

We study convergence rates of AdaGrad-Norm as an exemplar of adaptive st...
research
10/02/2020

Variance-Reduced Methods for Machine Learning

Stochastic optimization lies at the heart of machine learning, and its c...
research
05/28/2021

Simple steps are all you need: Frank-Wolfe and generalized self-concordant functions

Generalized self-concordance is a key property present in the objective ...
research
06/17/2023

Adaptive Strategies in Non-convex Optimization

An algorithm is said to be adaptive to a certain parameter (of the probl...

Please sign up or login with your details

Forgot password? Click here to reset