Learning High-Dimensional Single-Neuron ReLU Networks with Finite Samples

03/03/2023
by   Jingfeng Wu, et al.
1

This paper considers the problem of learning a single ReLU neuron with squared loss (a.k.a., ReLU regression) in the overparameterized regime, where the input dimension can exceed the number of samples. We analyze a Perceptron-type algorithm called GLM-tron (Kakade et al., 2011), and provide its dimension-free risk upper bounds for high-dimensional ReLU regression in both well-specified and misspecified settings. Our risk bounds recover several existing results as special cases. Moreover, in the well-specified setting, we also provide an instance-wise matching risk lower bound for GLM-tron. Our upper and lower risk bounds provide a sharp characterization of the high-dimensional ReLU regression problems that can be learned via GLM-tron. On the other hand, we provide some negative results for stochastic gradient descent (SGD) for ReLU regression with symmetric Bernoulli data: if the model is well-specified, the excess risk of SGD is provably no better than that of GLM-tron ignoring constant factors, for each problem instance; and in the noiseless case, GLM-tron can achieve a small risk while SGD unavoidably suffers from a constant risk in expectation. These results together suggest that GLM-tron might be preferable than SGD for high-dimensional ReLU regression.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2021

Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression

Stochastic gradient descent (SGD) has been demonstrated to generalize we...
research
05/14/2022

Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties

We develop a stochastic differential equation, called homogenized SGD, f...
research
03/07/2022

Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

Stochastic gradient descent (SGD) has achieved great success due to its ...
research
08/14/2018

Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization

Neural networks with ReLU activations have achieved great empirical succ...
research
03/25/2018

Minimizing Nonconvex Population Risk from Rough Empirical Risk

Population risk---the expectation of the loss over the sampling mechanis...
research
04/13/2023

High-dimensional limit of one-pass SGD on least squares

We give a description of the high-dimensional limit of one-pass single-b...
research
02/05/2019

Distribution-Dependent Analysis of Gibbs-ERM Principle

Gibbs-ERM learning is a natural idealized model of learning with stochas...

Please sign up or login with your details

Forgot password? Click here to reset