Reliable ABC model choice via random forests

by   Pierre Pudlo, et al.

Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. We propose a novel approach based on a machine learning tool named random forests to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with random forests and postponing the approximation of the posterior probability of the predicted MAP for a second stage also relying on random forests. Compared with earlier implementations of ABC model choice, the ABC random forest approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least fifty), and (iv) it includes an approximation of the posterior probability of the selected model. The call to random forests will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. The proposed methodologies are implemented in the R package abcrf available on the CRAN.


page 1

page 2

page 3

page 4


ABC random forests for Bayesian parameter inference

This preprint has been reviewed and recommended by Peer Community In Evo...

Likelihood-free Model Choice

This document is an invited chapter covering the specificities of ABC mo...

The Use of Binary Choice Forests to Model and Estimate Discrete Choice Models

We show the equivalence of discrete choice models and the class of binar...

Random Competing Risks Forests for Large Data

Random forests are a sensible non-parametric model to predict competing ...

Nonparametric Feature Selection by Random Forests and Deep Neural Networks

Random forests are a widely used machine learning algorithm, but their c...

The good, the bad, and the ugly: Bayesian model selection produces spurious posterior probabilities for phylogenetic trees

The Bayesian method is noted to produce spuriously high posterior probab...

Validating Bayesian Inference Algorithms with Simulation-Based Calibration

Verifying the correctness of Bayesian computation is challenging. This i...

Please sign up or login with your details

Forgot password? Click here to reset