Robust Bayesian Cluster Enumeration

A major challenge in cluster analysis is that the number of data clusters is mostly unknown and it must be estimated prior to clustering the observed data. In real-world applications, the observed data is often subject to heavy tailed noise and outliers which obscure the true underlying structure of the data. Consequently, estimating the number of clusters becomes challenging. To this end, we derive a robust cluster enumeration criterion by formulating the problem of estimating the number of clusters as maximization of the posterior probability of multivariate t_ν candidate models. We utilize Bayes' theorem and asymptotic approximations to come up with a robust criterion that possesses a closed-form expression. Further, we refine the derivation and provide a robust cluster enumeration criterion for the finite sample regime. The robust criteria require an estimate of cluster parameters for each candidate model as an input. Hence, we propose a two-step cluster enumeration algorithm that uses the expectation maximization algorithm to partition the data and estimate cluster parameters prior to the calculation of one of the robust criteria. The performance of the proposed algorithm is tested and compared to existing cluster enumeration methods using numerical and real data experiments.


page 1

page 2

page 3

page 4


A Novel Bayesian Cluster Enumeration Criterion for Unsupervised Learning

The Bayesian Information Criterion (BIC) has been widely used for estima...

Robust M-Estimation Based Bayesian Cluster Enumeration for Real Elliptically Symmetric Distributions

Robustly determining the optimal number of clusters in a data set is an ...

Real Elliptically Skewed Distributions and Their Application to Robust Cluster Analysis

This article proposes a new class of Real Elliptically Skewed (RESK) dis...

VARCLUST: clustering variables using dimensionality reduction

VARCLUST algorithm is proposed for clustering variables under the assump...

Robust Regularized Locality Preserving Indexing for Fiedler Vector Estimation

The Fiedler vector of a connected graph is the eigenvector associated wi...

Pair-Wise Cluster Analysis

This paper studies the problem of learning clusters which are consistent...

Robust Factor Analysis Parameter Estimation

This paper considers the problem of robustly estimating the parameters o...

Please sign up or login with your details

Forgot password? Click here to reset