Good linear classifiers are abundant in the interpolating regime

by   Ryan Theisen, et al.

Within the machine learning community, the widely-used uniform convergence framework seeks to answer the question of how complex models such as modern neural networks can generalize well to new data. This approach bounds the test error of the worst-case model one could have fit to the data, which presents fundamental limitations. In this paper, we revisit the statistical mechanics approach to learning, which instead attempts to understand the behavior of the typical model. To quantify this typicality in the setting of over-parameterized linear classification, we develop a methodology to compute the full distribution of test errors among interpolating classifiers. We apply our method to compute this distribution for several real and synthetic datasets. We find that in many regimes of interest, an overwhelming proportion of interpolating classifiers have good test performance, even though—as we demonstrate—classifiers with very high test error do exist. This shows that the behavior of the worst-case model can deviate substantially from that of the usual model. Furthermore, we observe that for a given training set and testing distribution, there is a critical value ε^* > 0 which is typical, in the sense that nearly all test errors eventually concentrate around it. Based on these empirical results, we study this phenomenon theoretically under simplifying assumptions on the data, and we derive simple asymptotic expressions for both the distribution of test errors as well as the critical value ε^*. Both of these results qualitatively reproduce our empirical findings. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice, and that approaches based on the statistical mechanics of learning offer a promising alternative.


page 1

page 2

page 3

page 4


Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior

We describe an approach to understand the peculiar and counterintuitive ...

The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime

Modern machine learning models are often so complex that they achieve va...

Rethinking Generalisation

In this paper, we present a new approach to computing the generalisation...

Out-of-Distribution Generalization in Kernel Regression

In real word applications, data generating process for training a machin...

How bad is worst-case data if you know where it comes from?

We introduce a framework for studying how distributional assumptions on ...

Only Tails Matter: Average-Case Universality and Robustness in the Convex Regime

The recently developed average-case analysis of optimization methods all...

Defuse: Harnessing Unrestricted Adversarial Examples for Debugging Models Beyond Test Accuracy

We typically compute aggregate statistics on held-out test data to asses...

Please sign up or login with your details

Forgot password? Click here to reset