ROOT-SGD: Sharp Nonasymptotics and Asymptotic Efficiency in a Single Algorithm

08/28/2020
by   Chris Junchi Li, et al.
19

The theory and practice of stochastic optimization has focused on stochastic gradient descent (SGD) in recent years, retaining the basic first-order stochastic nature of SGD while aiming to improve it via mechanisms such as averaging, momentum, and variance reduction. Improvement can be measured along various dimensions, however, and it has proved difficult to achieve improvements both in terms of nonasymptotic measures of convergence rate and asymptotic measures of distributional tightness. In this work, we consider first-order stochastic optimization from a general statistical point of view, motivating a specific form of recursive averaging of past stochastic gradients. The resulting algorithm, which we refer to as Recursive One-Over-T SGD (ROOT-SGD), matches the state-of-the-art convergence rate among online variance-reduced stochastic approximation methods. Moreover, under slightly stronger distributional assumptions, the rescaled last-iterate of ROOT-SGD converges to a zero-mean Gaussian distribution that achieves near-optimal covariance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2017

A Novel Stochastic Stratified Average Gradient Method: Convergence Rate and Its Complexity

SGD (Stochastic Gradient Descent) is a popular algorithm for large scale...
research
07/13/2023

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

Stochastic Gradient Descent (SGD) is one of the simplest and most popula...
research
01/22/2019

On convergence rate of stochastic proximal point algorithm without strong convexity, smoothness or bounded gradients

Significant parts of the recent learning literature on stochastic optimi...
research
12/02/2022

Covariance Estimators for the ROOT-SGD Algorithm in Online Learning

Online learning naturally arises in many statistical and machine learnin...
research
09/30/2020

Accelerating Optimization and Reinforcement Learning with Quasi-Stochastic Approximation

The ODE method has been a workhorse for algorithm design and analysis si...
research
06/08/2019

Reducing the variance in online optimization by transporting past gradients

Most stochastic optimization methods use gradients once before discardin...
research
06/08/2018

Lightweight Stochastic Optimization for Minimizing Finite Sums with Infinite Data

Variance reduction has been commonly used in stochastic optimization. It...

Please sign up or login with your details

Forgot password? Click here to reset