# Stochastic Gradient Descent applied to Least Squares regularizes in Sobolev spaces

We study the behavior of stochastic gradient descent applied to Ax -b _2^2 →min for invertible A ∈ℝ^n × n. We show that there is an explicit constant c_A depending (mildly) on A such that 𝔼 Ax_k+1-b^2_2≤(1 + c_A/A_F^2) A x_k -b ^2_2 - 2/A_F^2A^T (Ax_k - b)^2_2. This is a curious inequality: when applied to a discretization of a partial differential equation like -Δ u = f, the last term measures the regularity of the residual u_k - u in a higher Sobolev space than the remaining terms: if u_k - u has large fourth derivatives (i.e. bi-Laplacian Δ^2), then SGD will dramatically decrease the size of the second derivatives (i.e. Δ) of u_k - u. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This implies a regularization phenomenon: an energy cascade from large singular values to small singular values acts as regularizer.

READ FULL TEXT