Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

by   Raphaël Berthier, et al.

In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation Y = ⟨θ_*, X ⟩ between the random output Y and the random feature vector Φ(U), a potentially non-linear transformation of the inputs U. We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum θ_* and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum θ_* and of the feature vectors Φ(u). We interpret our result in the reproducing kernel Hilbert space framework; as a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points. The convergence depends on the Sobolev smoothness of the function and of a chosen kernel. Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on its spectral dimension.


page 1

page 2

page 3

page 4


Last iterate convergence of SGD for Least-Squares in the Interpolation regime

Motivated by the recent successes of neural networks that have the abili...

Convergence Rates for Stochastic Approximation on a Boundary

We analyze the behavior of projected stochastic gradient descent focusin...

Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral-Regularization Algorithms

We study generalization properties of distributed algorithms in the sett...

On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent

Constant step-size Stochastic Gradient Descent exhibits two phases: a tr...

Online Regularized Learning Algorithm for Functional Data

In recent years, functional linear models have attracted growing attenti...

Differential Equations for Modeling Asynchronous Algorithms

Asynchronous stochastic gradient descent (ASGD) is a popular parallel op...

On Structured Filtering-Clustering: Global Error Bound and Optimal First-Order Algorithms

In recent years, the filtering-clustering problems have been a central t...

Please sign up or login with your details

Forgot password? Click here to reset