Multiple imputation using dimension reduction techniques for high-dimensional data

05/13/2019
by   Domonique W. Hodge, et al.
0

Missing data present challenges in data analysis. Naive analyses such as complete-case and available-case analysis may introduce bias and loss of efficiency, and produce unreliable results. Multiple imputation (MI) is one of the most widely used methods for handling missing data which can be partly attributed to its ease of use. However, existing MI methods implemented in most statistical software are not applicable to or do not perform well in high-dimensional settings where the number of predictors is large relative to the sample size. To remedy this issue, we develop an MI approach that uses dimension reduction techniques. Specifically, in constructing imputation models in the presence of high-dimensional data our approach uses sure independent screening followed by either sparse principal component analysis (sPCA) or sufficient dimension reduction (SDR) techniques. Our simulation studies, conducted for high-dimensional data, demonstrate that using SIS followed by sPCA to perform MI achieves better performance than the other imputation methods including several existing imputation approaches. We apply our approach to analysis of gene expression data from a prostate cancer study.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/29/2022

High-dimensional imputation for the social sciences: a comparison of state-of-the-art methods

Including a large number of predictors in the imputation model underlyin...
research
08/07/2018

Generalized Integrative Principal Component Analysis for Multi-Type Data with Block-Wise Missing Structure

High-dimensional multi-source data are encountered in many fields. Despi...
research
11/01/2022

Missing data interpolation in integrative multi-cohort analysis with disparate covariate information

Integrative analysis of datasets generated by multiple cohorts is a wide...
research
05/30/2012

Finding Important Genes from High-Dimensional Data: An Appraisal of Statistical Tests and Machine-Learning Approaches

Over the past decades, statisticians and machine-learning researchers ha...
research
04/17/2012

Regularized Partial Least Squares with an Application to NMR Spectroscopy

High-dimensional data common in genomics, proteomics, and chemometrics o...
research
01/31/2023

Naive imputation implicitly regularizes high-dimensional linear models

Two different approaches exist to handle missing values for prediction: ...
research
05/11/2022

Principal Amalgamation Analysis for Microbiome Data

In recent years microbiome studies have become increasingly prevalent an...

Please sign up or login with your details

Forgot password? Click here to reset