Spatial prediction of apartment rent using regression-based and machine learning-based approaches with a large dataset

by   Takahiro Yoshida, et al.

Employing a large dataset (at most, the order of n = 10^6), this study attempts enhance the literature on the comparison between regression and machine learning (ML)-based rent price prediction models by adding new empirical evidence and considering the spatial dependence of the observations. The regression-based approach incorporates the nearest neighbor Gaussian processes (NNGP) model, enabling the application of kriging to large datasets. In contrast, the ML-based approach utilizes typical models: extreme gradient boosting (XGBoost), random forest (RF), and deep neural network (DNN). The out-of-sample prediction accuracy of these models was compared using Japanese apartment rent data, with a varying order of sample sizes (i.e., n = 10^4, 10^5, 10^6). The results showed that, as the sample size increased, XGBoost and RF outperformed NNGP with higher out-of-sample prediction accuracy. XGBoost achieved the highest prediction accuracy for all sample sizes and error measures in both logarithmic and real scales and for all price bands (when n = 10^5 and 10^6). A comparison of several methods to account for the spatial dependence in RF showed that simply adding spatial coordinates to the explanatory variables may be sufficient.


page 35

page 36

page 37

page 38

page 39


A comparison of apartment rent price prediction using a large dataset: Kriging versus DNN

The hedonic approach based on a regression model has been widely adopted...

Probabilistic Random Forest: A machine learning algorithm for noisy datasets

Machine learning (ML) algorithms become increasingly important in the an...

How Much Should I Pay? An Empirical Analysis on Monetary Prize in TopCoder

It is reported that task monetary prize is one of the most important mot...

Spatial regression-based transfer learning for prediction problems

Although spatial prediction is widely used for urban and environmental m...

When are Deep Networks really better than Random Forests at small sample sizes?

Random forests (RF) and deep networks (DN) are two of the most popular m...

Selection of contributing factors for predicting landslide susceptibility using machine learning and deep learning models

Landslides are a common natural disaster that can cause casualties, prop...

Please sign up or login with your details

Forgot password? Click here to reset