Scalable GWR: A linear-time algorithm for large-scale geographically weighted regression with polynomial kernels

by   Daisuke Murakami, et al.

While a number of studies have developed fast geographically weighted regression (GWR) algorithms for large samples, none of them achieves the linear-time estimation that is considered requisite for big data analysis in machine learning, geostatistics, and related domains. Against this backdrop, this study proposes a scalable GWR (ScaGWR) for large datasets. The key development is the calibration of the model through a pre-compression of the matrices and vectors whose size depends on the sample size, prior to the execution of leave-one-out cross-validation (LOOCV) that is the heaviest computational step in conventional GWR. This pre-compression allows us to run the proposed GWR extension such that its computation time increases linearly with sample size, whereas conventional GWR algorithms take at most quad-quadratic-order time. With this development, the ScaGWR can be calibrated with more than one million samples without parallelization. Moreover, the ScaGWR estimator can be regarded as an empirical Bayesian estimator that is more stable than the conventional GWR estimator. This study compared the ScaGWR with the conventional GWR in terms of estimation accuracy, predictive accuracy, and computational efficiency using a Monte Carlo simulation. Then, we apply these methods to a residential land analysis in the Tokyo Metropolitan Area. The code for ScaGWR is available in the R package scgwr, and is going to be incorporated into another R package, GWmodel.


page 1

page 2

page 3

page 4


A memory-free spatial additive mixed modeling for big spatial data

This study develops a spatial additive mixed modeling (AMM) approach est...

Sketching in Bayesian High Dimensional Regression With Big Data Using Gaussian Scale Mixture Priors

Bayesian computation of high dimensional linear regression models with a...

A linear time algorithm for multiscale quantile simulation

Change-point problems have appeared in a great many applications for exa...

Compressing Large Sample Data for Discriminant Analysis

Large-sample data became prevalent as data acquisition became cheaper an...

Spatially varying coefficient modeling for large datasets: Eliminating N from spatial regressions

While spatially varying coefficient (SVC) modeling is popular in applied...

confidence-planner: Easy-to-Use Prediction Confidence Estimation and Sample Size Planning

Machine learning applications, especially in the fields of me­di­cine an...

SGMM: Stochastic Approximation to Generalized Method of Moments

We introduce a new class of algorithms, Stochastic Generalized Method of...

Please sign up or login with your details

Forgot password? Click here to reset