Variable Selection for Multiply-imputed Data: A Bayesian Framework

by   Jungang Zou, et al.

Multiple imputation is a widely used technique to handle missing data in large observational studies. For variable selection on multiply-imputed datasets, however, if we conduct selection on each imputed dataset separately, different sets of important variables may be obtained. MI-LASSO, one of the most popular solutions to this problem, regards the same variable across all separate imputed datasets as a group of variables and exploits Group-LASSO to yield a consistent variable selection across all the multiply-imputed datasets. In this paper, we extend the MI-LASSO model into Bayesian framework and utilize five different Bayesian MI-LASSO models to perform variable selection on multiply-imputed data. These five models consist of three shrinkage priors based and two discrete mixture prior based approaches. We conduct a simulation study investigating the practical characteristics of each model across various settings. We further demonstrate these methods via a case study using the multiply-imputed data from the University of Michigan Dioxin Exposure Study. The Python package BMIselect is hosted on Github under an Apache-2.0 license:


page 1

page 10

page 13

page 14

page 15

page 22

page 23

page 25


A comparison of strategies for selecting auxiliary variables for multiple imputation

Multiple imputation (MI) is a popular method for handling missing data. ...

The Reciprocal Bayesian LASSO

A reciprocal LASSO (rLASSO) regularization employs a decreasing penalty ...

Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods

Penalized regression methods, such as lasso and elastic net, are used in...

Signal Adaptive Variable Selector for the Horseshoe Prior

In this article, we propose a simple method to perform variable selectio...

ABM: an automatic supervised feature engineering method for loss based models based on group and fused lasso

A vital problem in solving classification or regression problem is to ap...

Evaluating Effects of Tuition Fees: Lasso for the Case of Germany

We study the effect of the introduction of university tuition fees on th...

Strategies for variable selection in large-scale healthcare database studies with missing covariate and outcome data

Prior work has shown that combining bootstrap imputation with tree-based...

Please sign up or login with your details

Forgot password? Click here to reset