Handling missing data in model-based clustering

06/04/2020
by   Alessio Serafini, et al.
0

Gaussian Mixture models (GMMs) are a powerful tool for clustering, classification and density estimation when clustering structures are embedded in the data. The presence of missing values can largely impact the GMMs estimation process, thus handling missing data turns out to be a crucial point in clustering, classification and density estimation. Several techniques have been developed to impute the missing values before model estimation. Among these, multiple imputation is a simple and useful general approach to handle missing data. In this paper we propose two different methods to fit Gaussian mixtures in the presence of missing data. Both methods use a variant of the Monte Carlo Expectation-Maximisation (MCEM) algorithm for data augmentation. Thus, multiple imputations are performed during the E-step, followed by the standard M-step for a given eigen-decomposed component-covariance matrix. We show that the proposed methods outperform the multiple imputation approach, both in terms of clusters identification and density estimation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2018

Semiparametric fractional imputation using Gaussian mixture models for handling multivariate missing data

Item nonresponse is frequently encountered in practice. Ignoring missing...
research
08/14/2018

Multivariate Density Estimation with Missing Data

Multivariate density estimation is a popular technique in statistics wit...
research
04/11/2020

Handling missing data in a neural network approach for the identification of charged particles in a multilayer detector

Identification of charged particles in a multilayer detector by the ener...
research
09/04/2012

Efficient EM Training of Gaussian Mixtures with Missing Data

In data-mining applications, we are frequently faced with a large fracti...
research
04/17/2018

Hierarchical correlation reconstruction with missing data

Machine learning often needs to estimate density from a multidimensional...
research
04/17/2018

Hierarchical correlation reconstruction with missing data, for example for biology-inspired neuron

Machine learning often needs to estimate density from a multidimensional...
research
12/20/2021

Model-based Clustering with Missing Not At Random Data

In recent decades, technological advances have made it possible to colle...

Please sign up or login with your details

Forgot password? Click here to reset