On the Optimal Weighted ℓ_2 Regularization in Overparameterized Linear Regression

06/10/2020
by   Denny Wu, et al.
0

We consider the linear model 𝐲 = 𝐗β_⋆ + ϵ with 𝐗∈ℝ^n× p in the overparameterized regime p>n. We estimate β_⋆ via generalized (weighted) ridge regression: β̂_λ = (𝐗^T𝐗 + λΣ_w)^†𝐗^T𝐲, where Σ_w is the weighting matrix. Assuming a random effects model with general data covariance Σ_x and anisotropic prior on the true coefficients β_⋆, i.e., 𝔼β_⋆β_⋆^T = Σ_β, we provide an exact characterization of the prediction risk 𝔼(y-𝐱^Tβ̂_λ)^2 in the proportional asymptotic limit p/n→γ∈ (1,∞). Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting λ_ opt for the ridge parameter λ and confirm the implicit ℓ_2 regularization effect of overparameterization, which theoretically justifies the surprising empirical observation that λ_ opt can be negative in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when 𝐗 and β_⋆ are non-isotropic. Finally, we determine the optimal Σ_w for both the ridgeless (λ→ 0) and optimally regularized (λ = λ_ opt) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro