Can clustering scale sublinearly with its clusters? A variational EM acceleration of GMMs and k-means

11/09/2017
by   Dennis Forster, et al.
0

One iteration of k-means or EM for Gaussian mixture models (GMMs) scales linearly with the number of data points N, the number of clusters C, and the data dimensionality D. In this study, we explore whether one iteration of k-means or EM for GMMs can scale sublinearly with C at run-time, while the increase of the clustering objective remains effective. The tool we apply for complexity reduction is variational EM, which is typically applied to make training of generative models with exponentially many hidden states tractable. Here, we apply novel theoretical results on truncated variational EM to make tractable clustering algorithms more efficient. The basic idea is the use of a partial variational E-step which reduces the linear complexity of O(NCD) required for a full E-step to a sublinear complexity. Our main observation is that the linear dependency on C can be reduced to a dependency on a much smaller parameter G, related to the cluster neighborhood relationship. We focus on two versions of partial variational EM for clustering: variational GMM, scaling with O(NG^2D), and variational k-means, scaling with O(NGD) per iteration. Empirical results then show that these algorithms still require comparable numbers of iterations to increase the clustering objective to the same values as k-means. For data with many clusters, we consequently observe reductions of the net computational demands between two and three orders of magnitude. More generally, our results provide substantial empirical evidence in favor of clustering to scale sublinearly with C.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2018

Accelerated Training of Large-Scale Gaussian Mixtures by a Merger of Sublinear Approaches

We combine two recent lines of research on sublinear clustering to signi...
research
04/16/2017

k-Means is a Variational EM Approximation of Gaussian Mixture Models

We show that k-means (Lloyd's algorithm) is equivalent to a variational ...
research
12/27/2017

The information bottleneck and geometric clustering

The information bottleneck (IB) approach to clustering takes a joint dis...
research
02/07/2017

Truncated Variational EM for Semi-Supervised Neural Simpletrons

Inference and learning for probabilistic generative networks is often ve...
research
05/28/2013

Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

This paper presents a novel algorithm, based upon the dependent Dirichle...
research
09/23/2016

Fast Learning of Clusters and Topics via Sparse Posteriors

Mixture models and topic models generate each observation from a single ...
research
10/16/2017

When Do Birds of a Feather Flock Together? K-Means, Proximity, and Conic Programming

Given a set of data, one central goal is to group them into clusters bas...

Please sign up or login with your details

Forgot password? Click here to reset