A review on Bayesian model-based clustering
Clustering is an important task in many areas of knowledge: medicine and epidemiology, genomics, environmental science, economics, visual sciences, among others. Methodologies to perform inference on the number of clusters have often been proved to be inconsistent, and introducing a dependence structure among the clusters implies additional difficulties in the estimation process. In a Bayesian setting, clustering is performed by considering the unknown partition as a random object and define a prior distribution on it. This prior distribution may be induced by models on the observations, or directly defined for the partition. Several recent results, however, have shown the difficulties in consistently estimating the number of clusters, and, therefore, the partition. The problem itself of summarising the posterior distribution on the partition remains open, given the large dimension of the partition space. This work aims at reviewing the Bayesian approaches available in the literature to perform clustering, presenting advantages and disadvantages of each of them in order to suggest future lines of research.
READ FULL TEXT