Learning a Single Neuron with Adversarial Label Noise via Gradient Descent

06/17/2022
โˆ™
by   Ilias Diakonikolas, et al.
โˆ™
0
โˆ™

We study the fundamental problem of learning a single neuron, i.e., a function of the form ๐ฑโ†ฆฯƒ(๐ฐยท๐ฑ) for monotone activations ฯƒ:โ„โ†ฆโ„, with respect to the L_2^2-loss in the presence of adversarial label noise. Specifically, we are given labeled examples from a distribution D on (๐ฑ, y)โˆˆโ„^d ร—โ„ such that there exists ๐ฐ^โˆ—โˆˆโ„^d achieving F(๐ฐ^โˆ—)=ฯต, where F(๐ฐ)=๐„_(๐ฑ,y)โˆผ D[(ฯƒ(๐ฐยท๐ฑ)-y)^2]. The goal of the learner is to output a hypothesis vector ๐ฐ such that F(๐•จ)=C ฯต with high probability, where C>1 is a universal constant. As our main contribution, we give efficient constant-factor approximate learners for a broad class of distributions (including log-concave distributions) and activation functions. Concretely, for the class of isotropic log-concave distributions, we obtain the following important corollaries: For the logistic activation, we obtain the first polynomial-time constant factor approximation (even under the Gaussian distribution). Our algorithm has sample complexity O(d/ฯต), which is tight within polylogarithmic factors. For the ReLU activation, we give an efficient algorithm with sample complexity ร•(d (1/ฯต)). Prior to our work, the best known constant-factor approximate learner had sample complexity ฮฉฬƒ(d/ฯต). In both of these settings, our algorithms are simple, performing gradient-descent on the (regularized) L_2^2-loss. The correctness of our algorithms relies on novel structural results that we establish, showing that (essentially all) stationary points of the underlying non-convex loss are approximately optimal.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro