Clustering with missing data: which equivalent for Rubin's rules?

11/27/2020
by   Vincent Audigier, et al.
0

Multiple imputation (MI) is a popular method for dealing with missing values. However, the suitable way for applying clustering after MI remains unclear: how to pool partitions? How to assess the clustering instability when data are incomplete? By answering both questions, this paper proposed a complete view of clustering with missing data using MI. The problem of partitions pooling is here addressed using consensus clustering while, based on the bootstrap theory, we explain how to assess the instability related to observed and missing data. The new rules for pooling partitions and instability assessment are theoretically argued and extensively studied by simulation. Partitions pooling improves accuracy while measuring instability with missing data enlarges the data analysis possibilities: it allows assessment of the dependence of the clustering to the imputation model, as well as a convenient way for choosing the number of clusters when data are incomplete, as illustrated on a real data set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2021

Imputation of Missing Data Using Linear Gaussian Cluster-Weighted Modeling

Missing data theory deals with the statistical methods in the occurrence...
research
05/15/2022

Inference with Imputed Data: The Allure of Making Stuff Up

Incomplete observability of data generates an identification problem. Th...
research
10/18/2018

Determining the Number of Components in PLS Regression on Incomplete Data

Partial least squares regression---or PLS---is a multivariate method in ...
research
06/08/2021

Clustering with missing data: which imputation model for which cluster analysis method?

Multiple imputation (MI) is a popular method for dealing with missing va...
research
11/19/2020

Robustness to Missing Features using Hierarchical Clustering with Split Neural Networks

The problem of missing data has been persistent for a long time and pose...
research
09/12/2012

Likelihood Estimation with Incomplete Array Variate Observations

Missing data is an important challenge when dealing with high dimensiona...
research
08/28/2022

Leachable Component Clustering

Clustering attempts to partition data instances into several distinctive...

Please sign up or login with your details

Forgot password? Click here to reset