K-bMOM: a robust Lloyd-type clustering algorithm based on bootstrap Median-of-Means

02/10/2020
by   Camille Brunet-Saumard, et al.
0

We propose a new clustering algorithm that is robust to the presence of outliers in the dataset. We perform Lloyd-type iterations with robust estimates of the centroids. More precisely, we build on the idea of median-of-means statistics to estimate the centroids, but allow for replacement while constructing the blocks. We call this methodology the bootstrap median-of-means (bMOM) and prove that if enough blocks are generated through the bootstrap sampling, then it has a better breakdown point for mean estimation than the classical median-of-means (MOM), where the blocks form a partition of the dataset. From a clustering perspective, bMOM enables to take many blocks of a desired size, thus avoiding possible disappearance of clusters in some blocks, a pitfall that can occur for the partition-based generation of blocks of the classical median-of-means. Experiments on simulated datasets show that the proposed approach, called K-bMOM, performs better than existing robust K-means based methods. It is also recommended to the practitionner to use such a robust approach to initialize their clustering algorithm.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro