Sharing pattern submodels for prediction with missing values

06/22/2022
by   Lena Stempfle, et al.
6

Missing values are unavoidable in many applications of machine learning and present a challenge both during training and at test time. When variables are missing in recurring patterns, fitting separate pattern submodels have been proposed as a solution. However, independent models do not make efficient use of all available data. Conversely, fitting a shared model to the full data set typically relies on imputation which may be suboptimal when missingness depends on unobserved factors. We propose an alternative approach, called sharing pattern submodels, which make predictions that are a) robust to missing values at test time, b) maintains or improves the predictive power of pattern submodels, and c) has a short description enabling improved interpretability. We identify cases where sharing is provably optimal, even when missingness itself is predictive and when the prediction target depends on unobserved variables. Classification and regression experiments on synthetic data and two healthcare data sets demonstrate that our models achieve a favorable trade-off between pattern specialization and information sharing.

READ FULL TEXT
research
11/16/2022

The Missing Indicator Method: From Low to High Dimensions

Missing data is common in applied data science, particularly for tabular...
research
12/16/2019

Robust Prediction when Features are Missing

Predictors are learned using past training data containing features whic...
research
06/01/2021

What's a good imputation to predict with missing values?

How to learn a good predictor on data with missing values? Most efforts ...
research
06/05/2023

Conformal Prediction with Missing Values

Conformal prediction is a theoretically grounded framework for construct...
research
12/20/2021

Model-based Clustering with Missing Not At Random Data

In recent decades, technological advances have made it possible to colle...
research
01/19/2022

Bayesian Prediction with Covariates Subject to Detection Limits

Missing values in covariates due to censoring by signal interference or ...
research
02/13/2021

Variable importance scores

Scoring of variables for importance in predicting a response is an ill-d...

Please sign up or login with your details

Forgot password? Click here to reset