Ridgeless Interpolation with Shallow ReLU Networks in 1D is Nearest Neighbor Curvature Extrapolation and Provably Generalizes on Lipschitz Functions
We prove a precise geometric description of all one layer ReLU networks z(x;θ) with a single linear unit and input/output dimensions equal to one that interpolate a given dataset 𝒟={(x_i,f(x_i))} and, among all such interpolants, minimize the ℓ_2-norm of the neuron weights. Such networks can intuitively be thought of as those that minimize the mean-squared error over 𝒟 plus an infinitesimal weight decay penalty. We therefore refer to them as ridgeless ReLU interpolants. Our description proves that, to extrapolate values z(x;θ) for inputs x∈ (x_i,x_i+1) lying between two consecutive datapoints, a ridgeless ReLU interpolant simply compares the signs of the discrete estimates for the curvature of f at x_i and x_i+1 derived from the dataset 𝒟. If the curvature estimates at x_i and x_i+1 have different signs, then z(x;θ) must be linear on (x_i,x_i+1). If in contrast the curvature estimates at x_i and x_i+1 are both positive (resp. negative), then z(x;θ) is convex (resp. concave) on (x_i,x_i+1). Our results show that ridgeless ReLU interpolants achieve the best possible generalization for learning 1d Lipschitz functions, up to universal constants.
READ FULL TEXT