What's a good imputation to predict with missing values?

06/01/2021
by   Marine Le Morvan, et al.
0

How to learn a good predictor on data with missing values? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical grounding. Here we show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal. This result holds for all missing-values mechanisms, in contrast with the classic statistical results that require missing-at-random settings to use imputation in probabilistic modeling. Moreover, it implies that perfect conditional imputation may not be needed for good prediction asymptotically. In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn. Crafting instead the imputation so as to leave the regression function unchanged simply shifts the problem to learning discontinuous imputations. Rather, we suggest that it is easier to learn imputation and regression jointly. We propose such a procedure, adapting NeuMiss, a neural network capturing the conditional links across observed and unobserved variables whatever the missing-value pattern. Experiments confirm that joint imputation and regression through NeuMiss is better than various two step procedures in our experiments with finite number of samples.

READ FULL TEXT
research
10/26/2022

Imputation of missing values in multi-view data

When missing values occur in multi-view data, all features in a view are...
research
01/27/2020

Predicting Regression Probability Distributions with Imperfect Data Through Optimal Transformations

The goal of regression analysis is to predict the value of a numeric out...
research
04/07/2020

Learning Individual Models for Imputation (Technical Report)

Missing numerical values are prevalent, e.g., owing to unreliable sensor...
research
02/19/2019

On the consistency of supervised learning with missing values

In many application settings, the data are plagued with missing features...
research
06/22/2022

Sharing pattern submodels for prediction with missing values

Missing values are unavoidable in many applications of machine learning ...
research
04/08/2022

Controllable Missingness from Uncontrollable Missingness: Joint Learning Measurement Policy and Imputation

Due to the cost or interference of measurement, we need to control measu...
research
11/09/2019

Missing Features Reconstruction and Its Impact on Classification Accuracy

In real-world applications, we can encounter situations when a well-trai...

Please sign up or login with your details

Forgot password? Click here to reset