Multi-Scale Process Modelling and Distributed Computation for Spatial Data

07/17/2019
by   Andrew Zammit Mangion, et al.
0

Recent years have seen a huge development in spatial modelling and prediction methodology, driven by the increased availability of remote-sensing data and the reduced cost of distributed-processing technology. It is well known that modelling and prediction using infinite-dimensional process models is not possible with large data sets, and that both approximate models and, often, approximate-inference methods, are needed. The problem of fitting simple global spatial models to large data sets has been solved through the likes of multi-resolution approximations and nearest-neighbour techniques. Here we tackle the next challenge, that of fitting complex, nonstationary, multi-scale models to large data sets. We propose doing this through the use of superpositions of spatial processes with increasing spatial scale and increasing degrees of nonstationarity. Computation is facilitated through the use of Gaussian Markov random fields and parallel Markov chain Monte Carlo based on graph colouring. The resulting model allows for both distributed computing and distributed data. Importantly, it provides opportunities for genuine model and data scalability and yet is still able to borrow strength across large spatial scales. We illustrate a two-scale version on a data set of sea-surface temperature containing on the order of one million observations, and compare our approach to state-of-the-art spatial modelling and prediction methods.

READ FULL TEXT

page 16

page 19

research
10/06/2021

Modelling, Fitting, and Prediction with Non-Gaussian Spatial and Spatio-Temporal Data using FRK

Non-Gaussian spatial and spatial-temporal data are becoming increasingly...
research
12/16/2017

Parallel Markov Chain Monte Carlo for Bayesian Hierarchical Models with Big Data, in Two Stages

Due to the escalating growth of big data sets in recent years, new paral...
research
05/03/2023

Comparison of new computational methods for geostatistical modelling of malaria

Geostatistical analysis of health data is increasingly used to model spa...
research
05/13/2023

Indexing and Partitioning the Spatial Linear Model for Large Data Sets

We consider four main goals when fitting spatial linear models: 1) estim...
research
12/20/2021

Approximating Bayes in the 21st Century

The 21st century has seen an enormous growth in the development and use ...
research
04/06/2020

A quasi-Monte Carlo data compression algorithm for machine learning

We introduce an algorithm to reduce large data sets using so-called digi...
research
12/06/2017

A Multi-Resolution Spatial Model for Large Datasets Based on the Skew-t Distribution

Large, non-Gaussian spatial datasets pose a considerable modeling challe...

Please sign up or login with your details

Forgot password? Click here to reset