Distributed nonparametric regression imputation for missing response problems with large-scale data

06/04/2021
by   Ruoyu Wang, et al.
0

Nonparametric regression imputation is commonly used in missing data analysis. However, it suffers from the "curse of dimension". The problem can be alleviated by the explosive sample size in the era of big data, while the large-scale data size presents some challenges on the storage of data and the calculation of estimators. These challenges make the classical nonparametric regression imputation methods no longer applicable. This motivates us to develop two distributed nonparametric imputation methods. One is based on kernel smoothing and the other is based on the sieve method. The kernel based distributed imputation method has extremely low communication cost and the sieve based distributed imputation method can accommodate more local machines. In order to illustrate the proposed imputation methods, response mean estimation is considered. Two distributed nonparametric imputation estimators are proposed for the response mean, which are proved to be asymptotically normal with asymptotic variances achieving the semiparametric efficiency bound. The proposed methods are evaluated through simulation studies and are illustrated by a real data analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2021

Statistical Inference after Kernel Ridge Regression Imputation under item nonresponse

Imputation is a popular technique for handling missing data. We consider...
research
09/28/2022

Nonparametric augmented probability weighting with sparsity

Nonresponse frequently arises in practice, and simply ignoring it may le...
research
07/15/2021

Statistical inference using Regularized M-estimation in the reproducing kernel Hilbert space for handling missing data

Imputation and propensity score weighting are two popular techniques for...
research
06/12/2023

Nonparametric empirical Bayes biomarker imputation and estimation

Biomarkers are often measured in bulk to diagnose patients, monitor pati...
research
05/04/2021

Modern Subsampling Methods for Large-Scale Least Squares Regression

Subsampling methods aim to select a subsample as a surrogate for the obs...
research
10/04/2021

Internal Data Imputation in Data Warehouse Dimensions

Missing values occur commonly in the multidimensional data warehouses. T...
research
06/22/2010

Large gaps imputation in remote sensed imagery of the environment

Imputation of missing data in large regions of satellite imagery is nece...

Please sign up or login with your details

Forgot password? Click here to reset