On Markov chain Monte Carlo methods for tall data

05/11/2015
by   Rémi Bardenet, et al.
0

Markov chain Monte Carlo methods are often deemed too computationally intensive to be of any practical use for big data applications, and in particular for inference on datasets containing a large number n of individual data points, also known as tall datasets. In scenarios where data are assumed independent, various approaches to scale up the Metropolis-Hastings algorithm in a Bayesian inference context have been recently proposed in machine learning and computational statistics. These approaches can be grouped into two categories: divide-and-conquer approaches and, subsampling-based algorithms. The aims of this article are as follows. First, we present a comprehensive review of the existing literature, commenting on the underlying assumptions and theoretical guarantees of each method. Second, by leveraging our understanding of these limitations, we propose an original subsampling-based approach which samples from a distribution provably close to the posterior distribution of interest, yet can require less than O(n) data point likelihood evaluations at each iteration for certain statistical models in favourable scenarios. Finally, we have only been able so far to propose subsampling-based methods which display good performance in scenarios where the Bernstein-von Mises approximation of the target posterior distribution is excellent. It remains an open challenge to develop such methods in scenarios where the Bernstein-von Mises approximation is poor.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2021

Lagged couplings diagnose Markov chain Monte Carlo phylogenetic inference

Phylogenetic inference is an intractable statistical problem on a comple...
research
01/14/2015

Unbiased Bayes for Big Data: Paths of Partial Posteriors

A key quantity of interest in Bayesian inference are expectations of fun...
research
02/06/2023

Sampling-Based Accuracy Testing of Posterior Estimators for General Inference

Parameter inference, i.e. inferring the posterior distribution of the pa...
research
10/14/2021

Divide-and-Conquer Monte Carlo Fusion

Combining several (sample approximations of) distributions, which we ter...
research
07/14/2017

Big Data vs. complex physical models: a scalable inference algorithm

The data torrent unleashed by current and upcoming instruments requires ...
research
03/24/2019

A Fast Particle-Based Approach for Calibrating a 3-D Model of the Antarctic Ice Sheet

We consider the scientifically challenging and policy-relevant task of u...
research
09/10/2021

Diagnostics for Monte Carlo Algorithms for Models with Intractable Normalizing Functions

Models with intractable normalizing functions have numerous applications...

Please sign up or login with your details

Forgot password? Click here to reset