Quasi-Bernoulli Stick-breaking: Infinite Mixture with Cluster Consistency

08/23/2020
by   Cheng Zeng, et al.
0

In mixture modeling and clustering application, the number of components is often not known. The stick-breaking model is an appealing construction that assumes infinitely many components, while shrinking most of the redundant weights to near zero. However, it has been discovered that such a shrinkage is unsatisfactory: even when the component distribution is correctly specified, small and spurious weights will appear and give an inconsistent estimate on the cluster number. In this article, we propose a simple solution that gains stronger control on the redundant weights – when breaking each stick into two pieces, we adjust the length of the second piece by multiplying it to a quasi-Bernoulli random variable, supported at one and a positive constant close to zero. This substantially increases the chance of shrinking all the redundant weights to almost zero, leading to a consistent estimator on the cluster number; at the same time, it avoids the singularity due to assigning an exactly zero weight, and maintains a support in the infinite-dimensional space. As a stick-breaking model, its posterior computation can be carried out efficiently via the classic blocked Gibbs sampler, allowing straightforward extension of using non-Gaussian components. Compared to existing methods, our model demonstrates much superior performances in the simulations and data application, showing a substantial reduction in the number of clusters.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset