Adaptive Ensemble Learning of Spatiotemporal Processes with Calibrated Predictive Uncertainty: A Bayesian Nonparametric Approach
Ensemble learning is a mainstay in modern data science practice. Conventional ensemble algorithms assign to base models a set of deterministic, constant model weights that (1) do not fully account for individual models' varying accuracy across data subgroups, nor (2) provide uncertainty estimates for the ensemble prediction. These shortcomings can yield predictions that are precise but biased, which can negatively impact the performance of the algorithm in real-word applications. In this work, we present an adaptive, probabilistic approach to ensemble learning using a transformed Gaussian process as a prior for the ensemble weights. Given input features, our method optimally combines base models based on their predictive accuracy in the feature space, and provides interpretable estimates of the uncertainty associated with both model selection, as reflected by the ensemble weights, and the overall ensemble predictions. Furthermore, to ensure that this quantification of the model uncertainty is accurate, we propose additional machinery to non-parametrically model the ensemble's predictive cumulative density function (CDF) so that it is consistent with the empirical distribution of the data. We apply the proposed method to data simulated from a nonlinear regression model, and to generate a spatial prediction model and associated prediction uncertainties for fine particle levels in eastern Massachusetts, USA.
READ FULL TEXT