Breaking the Lower Bound with (Little) Structure: Acceleration in Non-Convex Stochastic Optimization with Heavy-Tailed Noise

by   Zijian Liu, et al.
NYU college

We consider the stochastic optimization problem with smooth but not necessarily convex objectives in the heavy-tailed noise regime, where the stochastic gradient's noise is assumed to have bounded pth moment (p∈(1,2]). Zhang et al. (2020) is the first to prove the Ω(T^1-p/3p-2) lower bound for convergence (in expectation) and provides a simple clipping algorithm that matches this optimal rate. Cutkosky and Mehta (2021) proposes another algorithm, which is shown to achieve the nearly optimal high-probability convergence guarantee O(log(T/δ)T^1-p/3p-2), where δ is the probability of failure. However, this desirable guarantee is only established under the additional assumption that the stochastic gradient itself is bounded in pth moment, which fails to hold even for quadratic objectives and centered Gaussian noise. In this work, we first improve the analysis of the algorithm in Cutkosky and Mehta (2021) to obtain the same nearly optimal high-probability convergence rate O(log(T/δ)T^1-p/3p-2), without the above-mentioned restrictive assumption. Next, and curiously, we show that one can achieve a faster rate than that dictated by the lower bound Ω(T^1-p/3p-2) with only a tiny bit of structure, i.e., when the objective function F(x) is assumed to be in the form of 𝔼_Ξ∼𝒟[f(x,Ξ)], arguably the most widely applicable class of stochastic optimization problems. For this class of problems, we propose the first variance-reduced accelerated algorithm and establish that it guarantees a high-probability convergence rate of O(log(T/δ)T^1-p/2p-1) under a mild condition, which is faster than Ω(T^1-p/3p-2). Notably, even when specialized to the finite-variance case, our result yields the (near-)optimal high-probability rate O(log(T/δ)T^-1/3).


page 1

page 2

page 3

page 4


Near-Optimal High-Probability Convergence for Non-Convex Stochastic Optimization with Variance Reduction

Traditional analyses for non-convex stochastic optimization problems cha...

Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises

Recently, several studies consider the stochastic optimization problem b...

High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize

In this paper, we propose a new, simplified high probability analysis of...

High Probability Convergence of Stochastic Gradient Methods

In this work, we describe a generic approach to show convergence with hi...

Nearly Optimal Robust Method for Convex Compositional Problems with Heavy-Tailed Noise

In this paper, we propose robust stochastic algorithms for solving conve...

Finite Precision Stochastic Optimization -- Accounting for the Bias

We consider first order stochastic optimization where the oracle must qu...

Finite Precision Stochastic Optimisation -- Accounting for the Bias

We consider first order stochastic optimization where the oracle must qu...

Please sign up or login with your details

Forgot password? Click here to reset