Stagewise Learning for Sparse Clustering of Discretely-Valued Data

06/09/2015
by   Vincent Zhao, et al.
0

The performance of EM in learning mixtures of product distributions often depends on the initialization. This can be problematic in crowdsourcing and other applications, e.g. when a small number of 'experts' are diluted by a large number of noisy, unreliable participants. We develop a new EM algorithm that is driven by these experts. In a manner that differs from other approaches, we start from a single mixture class. The algorithm then develops the set of 'experts' in a stagewise fashion based on a mutual information criterion. At each stage EM operates on this subset of the players, effectively regularizing the E rather than the M step. Experiments show that stagewise EM outperforms other initialization techniques for crowdsourcing and neurosciences applications, and can guide a full EM to results comparable to those obtained knowing the exact distribution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/25/2013

Robust EM algorithm for model-based curve clustering

Model-based clustering approaches concern the paradigm of exploratory da...
research
09/12/2019

Regularized Estimation and Feature Selection in Mixtures of Gaussian-Gated Experts Models

Mixtures-of-Experts models and their maximum likelihood estimation (MLE)...
research
01/30/2013

An Experimental Comparison of Several Clustering and Initialization Methods

We examine methods for clustering in high dimensions. In the first part ...
research
05/18/2021

A New, Computationally Efficient "Blech Criterion" for Immortality in General Interconnects

Traditional methodologies for analyzing electromigration (EM) in VLSI ci...
research
08/09/2014

Statistical guarantees for the EM algorithm: From population to sample-based analysis

We develop a general framework for proving rigorous guarantees on the pe...
research
02/20/2023

Sharp analysis of EM for learning mixtures of pairwise differences

We consider a symmetric mixture of linear regressions with random sample...
research
10/01/2018

Accelerated Training of Large-Scale Gaussian Mixtures by a Merger of Sublinear Approaches

We combine two recent lines of research on sublinear clustering to signi...

Please sign up or login with your details

Forgot password? Click here to reset