On the role of data in PAC-Bayes bounds

The dominant term in PAC-Bayes bounds is often the Kullback–Leibler divergence between the posterior and prior. For so-called linear PAC-Bayes risk bounds based on the empirical risk of a fixed posterior kernel, it is possible to minimize the expected value of the bound by choosing the prior to be the expected posterior, which we call the oracle prior on the account that it is distribution dependent. In this work, we show that the bound based on the oracle prior can be suboptimal: In some cases, a stronger bound is obtained by using a data-dependent oracle prior, i.e., a conditional expectation of the posterior, given a subset of the training data that is then excluded from the empirical risk term. While using data to learn a prior is a known heuristic, its essential role in optimal bounds is new. In fact, we show that using data can mean the difference between vacuous and nonvacuous bounds. We apply this new principle in the setting of nonconvex learning, simulating data-dependent oracle priors on MNIST and Fashion MNIST with and without held-out data, and demonstrating new nonvacuous bounds in both cases.


page 1

page 2

page 3

page 4


PAC-Bayes bounds for stable algorithms with instance-dependent priors

PAC-Bayes bounds have been proposed to get risk estimates based on a tra...

Chebyshev-Cantelli PAC-Bayes-Bennett Inequality for the Weighted Majority Vote

We present a new second-order oracle bound for the expected risk of a we...

Integral Probability Metrics PAC-Bayes Bounds

We present a PAC-Bayes-style generalization bound which enables the repl...

Optimal PAC-Bayesian Posteriors for Stochastic Classifiers and their use for Choice of SVM Regularization Parameter

PAC-Bayesian set up involves a stochastic classifier characterized by a ...

Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning

We study an approach to learning pruning masks by optimizing the expecte...

Learning PAC-Bayes Priors for Probabilistic Neural Networks

Recent works have investigated deep learning models trained by optimisin...

The E-Posterior

We develop a representation of a decision maker's uncertainty based on e...

Please sign up or login with your details

Forgot password? Click here to reset