On the Power and Limitations of Random Features for Understanding Neural Networks

04/01/2019
by   Gilad Yehudai, et al.
0

Recently, a spate of papers have provided positive theoretical results for training over-parameterized neural networks (where the network size is larger than what is needed to achieve low error). The key insight is that with sufficient over-parameterization, gradient-based methods will implicitly leave some components of the network relatively unchanged, so the optimization dynamics will behave as if those components are essentially fixed at their initial random values. In fact, fixing these explicitly leads to the well-known approach of learning with random features. In other words, these techniques imply that we can successfully learn with neural networks, whenever we can successfully learn with random features. In this paper, we first review these techniques, providing a simple and self-contained analysis for one-hidden-layer networks. We then argue that despite the impressive positive results, random feature approaches are also inherently limited in what they can explain. In particular, we rigorously show that random features cannot be used to learn even a single ReLU neuron with standard Gaussian inputs, unless the network size (or magnitude of the weights) is exponentially large. Since a single neuron is learnable with gradient-based methods, we conclude that we are still far from a satisfying general explanation for the empirical success of neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2019

A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks

Empirical studies show that gradient based methods can learn deep neural...
research
06/12/2019

Decoupling Gating from Linearity

ReLU neural-networks have been in the focus of many recent theoretical w...
research
09/05/2016

Distribution-Specific Hardness of Learning Neural Networks

Although neural networks are routinely and successfully trained in pract...
research
07/11/2023

Using Linear Regression for Iteratively Training Neural Networks

We present a simple linear regression based approach for learning the we...
research
12/19/2018

A Note on Lazy Training in Supervised Differentiable Programming

In a series of recent theoretical works, it has been shown that strongly...
research
06/10/2023

Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs

We prove that, for the fundamental regression task of learning a single ...
research
12/20/2014

Outperforming Word2Vec on Analogy Tasks with Random Projections

We present a distributed vector representation based on a simplification...

Please sign up or login with your details

Forgot password? Click here to reset