Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect

by   Lorenzo Noci, et al.

The "cold posterior effect" (CPE) in Bayesian deep learning describes the uncomforting observation that the predictive performance of Bayesian neural networks can be significantly improved if the Bayes posterior is artificially sharpened using a temperature parameter T<1. The CPE is problematic in theory and practice and since the effect was identified many researchers have proposed hypotheses to explain the phenomenon. However, despite this intensive research effort the effect remains poorly understood. In this work we provide novel and nuanced evidence relevant to existing explanations for the cold posterior effect, disentangling three hypotheses: 1. The dataset curation hypothesis of Aitchison (2020): we show empirically that the CPE does not arise in a real curated data set but can be produced in a controlled experiment with varying curation strength. 2. The data augmentation hypothesis of Izmailov et al. (2021) and Fortuin et al. (2021): we show empirically that data augmentation is sufficient but not necessary for the CPE to be present. 3. The bad prior hypothesis of Wenzel et al. (2020): we use a simple experiment evaluating the relative importance of the prior and the likelihood, strongly linking the CPE to the prior. Our results demonstrate how the CPE can arise in isolation from synthetic curation, data augmentation, and bad priors. Cold posteriors observed "in the wild" are therefore unlikely to arise from a single simple cause; as a result, we do not expect a simple "fix" for cold posteriors.


Data augmentation in Bayesian neural networks and the cold posterior effect

Data augmentation is a highly effective approach for improving performan...

How Tempering Fixes Data Augmentation in Bayesian Neural Networks

While Bayesian neural networks (BNNs) provide a sound and principled alt...

On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification

Aleatoric uncertainty captures the inherent randomness of the data, such...

How Good is the Bayes Posterior in Deep Neural Networks Really?

During the past five years the Bayesian deep learning community has deve...

From p-Values to Posterior Probabilities of Hypothesis

Minimum Bayes factors are commonly used to transform two-sided p-values ...

A Fusion-Denoising Attack on InstaHide with Data Augmentation

InstaHide is a state-of-the-art mechanism for protecting private trainin...

Three tree priors and five datasets: A study of the effect of tree priors in Indo-European phylogenetics

The age of the root of the Indo-European language family has received mu...

Please sign up or login with your details

Forgot password? Click here to reset