Clustering Multivariate Data using Factor Analytic Bayesian Mixtures with an Unknown Number of Components

06/02/2019
by   Panagiotis Papastamoulis, et al.
0

Recent work on overfitting Bayesian mixtures of distributions offers a powerful framework for clustering multivariate data using a latent Gaussian model which resembles the factor analysis model. The flexibility provided by overfitting mixture models yields a simple and efficient way in order to estimate the unknown number of clusters and model parameters by Markov chain Monte Carlo (MCMC) sampling. The present study extends this approach by considering a set of eight parameterizations, giving rise to parsimonious representations of the covariance matrix per cluster. A Gibbs sampler combined with a prior parallel tempering scheme is implemented in order to approximately sample from the posterior distribution of the overfitting mixture. The parameterization and number of factors is selected according to the Bayesian Information Criterion. Identifiability issues related to label switching are dealt by post-processing the simulated output with the Equivalence Classes Representatives algorithm. The contributed method and software are demonstrated and compared to similar models estimated using the Expectation-Maximization algorithm on simulated and real datasets. The software is available online at https://CRAN.R-project.org/package=fabMix.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset