UAFS: Uncertainty-Aware Feature Selection for Problems with Missing Data

04/02/2019
by   Andrew J. Becker, et al.
0

Missing data are a concern in many real world data sets and imputation methods are often needed to estimate the values of missing data, but data sets with excessive missingness and high dimensionality challenge most approaches to imputation. Here we show that appropriate feature selection can be an effective preprocessing step for imputation, allowing for more accurate imputation and subsequent model predictions. The key feature of this preprocessing is that it incorporates uncertainty: by accounting for uncertainty due to missingness when selecting features we can reduce the degree of missingness while also limiting the number of uninformative features being used to make predictive models. We introduce a method to perform uncertainty-aware feature selection (UAFS), provide a theoretical motivation, and test UAFS on both real and synthetic problems, demonstrating that across a variety of data sets and levels of missingness we can improve the accuracy of imputations. Improved imputation due to UAFS also results in improved prediction accuracy when performing supervised learning using these imputed data sets. Our UAFS method is general and can be fruitfully coupled with a variety of imputation methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2021

Prediction with Missing Data

Missing information is inevitable in real-world data sets. While imputat...
research
04/18/2021

Multi-objective Feature Selection with Missing Data in Classification

Feature selection (FS) is an important research topic in machine learnin...
research
11/04/2014

Iterated geometric harmonics for data imputation and reconstruction of missing data

The method of geometric harmonics is adapted to the situation of incompl...
research
02/23/2023

A Comparison of Modeling Preprocessing Techniques

This paper compares the performance of various data processing methods i...
research
06/30/2020

Tomographic Auto-Encoder: Unsupervised Bayesian Recovery of Corrupted Data

We propose a new probabilistic method for unsupervised recovery of corru...
research
05/29/2018

Winning Models for GPA, Grit, and Layoff in the Fragile Families Challenge

In this paper, we discuss and analyze our approach to the Fragile Famili...
research
11/26/2019

Adventures in Multi-Omics I: Combining heterogeneous data sets via relationships matrices

In this article, we propose a covariance based method for combining impa...

Please sign up or login with your details

Forgot password? Click here to reset