DeepAI AI Chat
Log In Sign Up

A Note on "Assessing Generalization of SGD via Disagreement"

by   Andreas Kirsch, et al.

Jiang et al. (2021) give empirical evidence that the average test error of deep neural networks can be estimated via the prediction disagreement of two separately trained networks. They also provide a theoretical explanation that this 'Generalization Disagreement Equality' follows from the well-calibrated nature of deep ensembles under the notion of a proposed 'class-aggregated calibration'. In this paper we show that the approach suggested might be impractical because a deep ensemble's calibration deteriorates under distribution shift, which is exactly when the coupling of test error and disagreement would be of practical value. We present both theoretical and experimental evidence, re-deriving the theoretical statements using a simple Bayesian perspective and show them to be straightforward and more generic: they apply to any discriminative model – not only ensembles whose members output one-hot class predictions. The proposed calibration metrics are also equivalent to two metrics introduced by Nixon et al. (2019): 'ACE' and 'SCE'.


page 1

page 2

page 3

page 4


Should Ensemble Members Be Calibrated?

Underlying the use of statistical approaches for a wide range of applica...

Assessing Generalization of SGD via Disagreement

We empirically show that the test error of deep networks can be estimate...

The Calibration Generalization Gap

Calibration is a fundamental property of a good predictive model: it req...

Generalization on the Unseen, Logic Reasoning and Degree Curriculum

This paper considers the learning of logical (Boolean) functions with fo...

The Pitfalls of Simplicity Bias in Neural Networks

Several works have proposed Simplicity Bias (SB)—the tendency of standar...

On the Current State of Research in Explaining Ensemble Performance Using Margins

Empirical evidence shows that ensembles, such as bagging, boosting, rand...

SuperNet – An efficient method of neural networks ensembling

The main flaw of neural network ensembling is that it is exceptionally d...