On the Self-Penalization Phenomenon in Feature Selection

10/12/2021
by   Michael I. Jordan, et al.
17

We describe an implicit sparsity-inducing mechanism based on minimization over a family of kernels: min_β, f 𝔼[L(Y, f(β^1/q⊙ X)] + λ_n f_ℋ_q^2  subject to  β≥ 0, where L is the loss, ⊙ is coordinate-wise multiplication and ℋ_q is the reproducing kernel Hilbert space based on the kernel k_q(x, x') = h(x-x'_q^q), where ·_q is the ℓ_q norm. Using gradient descent to optimize this objective with respect to β leads to exactly sparse stationary points with high probability. The sparsity is achieved without using any of the well-known explicit sparsification techniques such as penalization (e.g., ℓ_1), early stopping or post-processing (e.g., clipping). As an application, we use this sparsity-inducing mechanism to build algorithms consistent for feature selection.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro