Parallel cross-validation: a scalable fitting method for Gaussian process models

12/31/2019
by   Florian Gerber, et al.
0

Gaussian process (GP) models are widely used to analyze spatially referenced data and to predict values at locations without observations. In contrast to many algorithmic procedures, GP models are based on a statistical framework, which enables uncertainty quantification of the model structure and predictions. Both the evaluation of the likelihood and the prediction involve solving linear systems. Hence, the computational costs are large and limit the amount of data that can be handled. While there are many approximation strategies that lower the computational cost of GP models, they often provide only sub-optimal support for the parallel computing capabilities of current (high-performance) computing environments. We aim at bridging this gap with a parameter estimation and prediction method that is designed to be parallelizable. More precisely, we divide the spatial domain into overlapping subsets and use cross-validation (CV) to estimate the covariance parameters in parallel. We present simulation studies, which assess the accuracy of the parameter estimates and predictions. Moreover, we show that our implementation has good weak and strong parallel scaling properties. For illustration, we fit an exponential covariance model to a scientifically relevant canopy height dataset with 5 million observations. Using 512 processor cores in parallel brings the evaluation time of one covariance parameter configuration to less than 1.5 minutes. The parallel CV method can be easily extended to include approximate likelihood methods, multivariate and spatio-temporal data, as well as non-stationary covariance models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2017

Locally stationary spatio-temporal interpolation of Argo profiling float data

Argo floats measure sea water temperature and salinity in the upper 2,00...
research
12/24/2020

Kryging: Geostatistical analysis of large-scale datasets using Krylov subspace methods

Analyzing massive spatial datasets using Gaussian process model poses co...
research
04/29/2021

MuyGPs: Scalable Gaussian Process Hyperparameter Estimation Using Local Cross-Validation

Gaussian processes (GPs) are non-linear probabilistic models popular in ...
research
01/08/2021

Fast calculation of Gaussian Process multiple-fold cross-validation residuals and their covariances

We generalize fast Gaussian process leave-one-out formulae to multiple-f...
research
08/10/2023

Exploring the Efficacy of Statistical and Deep Learning Methods for Large Spatial Datasets: A Case Study

Increasingly large and complex spatial datasets pose massive inferential...
research
02/13/2019

Wireless Traffic Prediction with Scalable Gaussian Process: Framework, Algorithms, and Verification

The cloud radio access network (C-RAN) is a promising paradigm to meet t...
research
01/11/2022

A penalised piecewise-linear model for non-stationary extreme value analysis of peaks over threshold

Metocean extremes often vary systematically with covariates such as dire...

Please sign up or login with your details

Forgot password? Click here to reset