What is the best predictor that you can compute in five minutes using a given Bayesian hierarchical model?

by   Jonathan R. Bradley, et al.

The goal of this paper is to provide a way for statisticians to answer the question posed in the title of this article using any Bayesian hierarchical model of their choosing and without imposing additional restrictive model assumptions. We are motivated by the fact that the rise of “big data” has created difficulties for statisticians to directly apply their methods to big datasets. We introduce a “data subset model” to the popular “data model, process model, and parameter model” framework used to summarize Bayesian hierarchical models. The hyperparameters of the data subset model are specified constructively in that they are chosen such that the implied size of the subset satisfies pre-defined computational constraints. Thus, these hyperparameters effectively calibrates the statistical model to the computer itself to obtain predictions/estimations in a pre-specified amount of time. Several properties of the data subset model are provided including: propriety, partial sufficiency, and semi-parametric properties. Furthermore, we show that subsets of normally distributed data are asymptotically partially sufficient under reasonable constraints. Results from a simulated dataset will be presented across different computers, to show the effect of the computer on the statistical analysis. Additionally, we provide a joint spatial analysis of two different environmental datasets.


page 20

page 24


Incorporating Subsampling into Bayesian Models for High-Dimensional Spatial Data

Additive spatial statistical models with weakly stationary process assum...

Divide and Recombine for Large and Complex Data: Model Likelihood Functions using MCMC

In Divide & Recombine (D&R), big data are divided into subsets, each ana...

Goodness-of-Fit Tests for Large Datasets

Nowadays, data analysis in the world of Big Data is connected typically ...

MILO: Model-Agnostic Subset Selection Framework for Efficient Model Training and Tuning

Training deep networks and tuning hyperparameters on large datasets is c...

Global Identifiability Analysis of Statistical Models using an Information-Theoretic Estimator in a Bayesian Framework

An information-theoretic estimator is proposed to assess the global iden...

Hyperparameter Selection for Subsampling Bootstraps

Massive data analysis becomes increasingly prevalent, subsampling method...

Ignorability in Statistical and Probabilistic Inference

When dealing with incomplete data in statistical learning, or incomplete...

Please sign up or login with your details

Forgot password? Click here to reset