Highly over-parameterized classifiers generalize since bad solutions are rare

11/07/2022
by   Julius Martinetz, et al.
0

We study the generalization of over-parameterized classifiers where Empirical Risk Minimization (ERM) for learning leads to zero training error. In these over-parameterized settings there are many global minima with zero training error, some of which generalize better than others. We show that under certain conditions the fraction of "bad" global minima with a true error larger than ϵ decays to zero exponentially fast with the number of training data n. The bound depends on the distribution of the true error over the set of classifier functions used for the given classification problem, and does not necessarily depend on the size or complexity (e.g. the number of parameters) of the classifier function set. This might explain the unexpectedly good generalization even of highly over-parameterized Neural Networks. We support our mathematical framework with experiments on a synthetic data set and a subset of MNIST.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2021

Generalization Performance of Empirical Risk Minimization on Over-parameterized Deep ReLU Nets

In this paper, we study the generalization performance of global minima ...
research
06/06/2019

Bad Global Minima Exist and SGD Can Reach Them

Several recent works have aimed to explain why severely overparameterize...
research
07/09/2020

Maximum-and-Concatenation Networks

While successful in many fields, deep neural networks (DNNs) still suffe...
research
01/06/2019

Scaling description of generalization with number of parameters in deep learning

We provide a description for the evolution of the generalization perform...
research
03/21/2019

Harmless interpolation of noisy data in regression

A continuing mystery in understanding the empirical success of deep neur...
research
02/28/2019

Novel and Efficient Approximations for Zero-One Loss of Linear Classifiers

The predictive quality of machine learning models is typically measured ...
research
08/11/2021

Asymptotic optimality and minimal complexity of classification by random projection

The generalization error of a classifier is related to the complexity of...

Please sign up or login with your details

Forgot password? Click here to reset