DeepAI AI Chat
Log In Sign Up

Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression

by   Aymeric Dieuleveut, et al.
Cole Normale Suprieure

We consider the optimization of a quadratic objective function whose gradients are only accessible through a stochastic oracle that returns the gradient at any given point plus a zero-mean finite variance random error. We present the first algorithm that achieves jointly the optimal prediction error rates for least-squares regression, both in terms of forgetting of initial conditions in O(1/n 2), and in terms of dependence on the noise and dimension d of the problem, as O(d/n). Our new algorithm is based on averaged accelerated regularized gradient descent, and may also be analyzed through finer assumptions on initial conditions and the Hessian matrix, leading to dimension-free quantities that may still be small while the " optimal " terms above are large. In order to characterize the tightness of these new bounds, we consider an application to non-parametric regression and use the known lower bounds on the statistical performance (without computational limits), which happen to match our bounds obtained from a single pass on the data and thus show optimality of our algorithm in a wide variety of particular trade-offs between bias and variance.


page 1

page 2

page 3

page 4


Accelerated SGD for Non-Strongly-Convex Least Squares

We consider stochastic approximation for the least squares regression pr...

Optimal learning rates for Kernel Conjugate Gradient regression

We prove rates of convergence in the statistical sense for kernel-based ...

Accelerated stochastic approximation with state-dependent noise

We consider a class of stochastic smooth convex optimization problems un...

Multiclass learning with margin: exponential rates with no bias-variance trade-off

We study the behavior of error bounds for multiclass classification unde...

Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up

We analyse the learning performance of Distributed Gradient Descent in t...

Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization

We consider the smooth convex-concave bilinearly-coupled saddle-point pr...

Robustness of accelerated first-order algorithms for strongly convex optimization problems

We study the robustness of accelerated first-order algorithms to stochas...