One-Pass Sparsified Gaussian Mixtures
We present a one-pass sparsified Gaussian mixture model (SGMM). Given P-dimensional datapoints X = {x_i}_i=1^N, the model fits K Gaussian distributions to X and (softly) classifies each xi to these clusters. After paying an up-front cost of O(NP P) to precondition the data, we subsample Q entries of each datapoint and discard the full P-dimensional data. SGMM operates in O(KNQ) time per iteration for diagonal or spherical covariances, independent of P, while estimating the model parameters θ in the full P-dimensional space, making it one-pass and hence suitable for streaming data. We derive the maximum likelihood estimators for θ in the sparsified regime, demonstrate clustering on synthetic and real data, and show that SGMM is faster than GMM while preserving accuracy.
READ FULL TEXT