Imputation of Missing Data Using Linear Gaussian Cluster-Weighted Modeling

Missing data theory deals with the statistical methods in the occurrence of missing data. Missing data occurs when some values are not stored or observed for variables of interest. However, most of the statistical theory assumes that data is fully observed. An alternative to deal with incomplete databases is to fill in the spaces corresponding to the missing information based on some criteria, this technique is called imputation. We introduce a new imputation methodology for databases with univariate missing patterns based on additional information from fully-observed auxiliary variables. We assume that the non-observed variable is continuous, and that auxiliary variables assist to improve the imputation capacity of the model. In a fully Bayesian framework, our method uses a flexible mixture of multivariate normal distributions to model the response and the auxiliary variables jointly. Under this framework, we use the properties of Gaussian Cluster-Weighted modeling to construct a predictive model to impute the missing values using the information from the covariates. Simulations studies and a real data illustration are presented to show the method imputation capacity under a variety of scenarios and in comparison to other literature methods.


Imputation of missing data using multivariate Gaussian Linear Cluster-Weighted Modeling

Missing data arises when certain values are not recorded or observed for...

Dealing with missing data using attention and latent space regularization

Most practical data science problems encounter missing data. A wide vari...

Graphical and numerical diagnostic tools to assess multiple imputation models by posterior predictive checking

Missing data are often dealt with multiple imputation. A crucial part of...

Missing Value Knockoffs

One limitation of the most statistical/machine learning-based variable s...

Regression-based imputation of explanatory discrete missing data

Imputation of missing values is a strategy for handling non-responses in...

Clustering with missing data: which imputation model for which cluster analysis method?

Multiple imputation (MI) is a popular method for dealing with missing va...

Clustering with missing data: which equivalent for Rubin's rules?

Multiple imputation (MI) is a popular method for dealing with missing va...

Please sign up or login with your details

Forgot password? Click here to reset