Hierarchical mixtures of Gaussians for combined dimensionality reduction and clustering

06/10/2022
by   Sacha Sokoloski, et al.
0

To avoid the curse of dimensionality, a common approach to clustering high-dimensional data is to first project the data into a space of reduced dimension, and then cluster the projected data. Although effective, this two-stage approach prevents joint optimization of the dimensionality-reduction and clustering models, and obscures how well the complete model describes the data. Here, we show how a family of such two-stage models can be combined into a single, hierarchical model that we call a hierarchical mixture of Gaussians (HMoG). An HMoG simultaneously captures both dimensionality-reduction and clustering, and its performance is quantified in closed-form by the likelihood function. By formulating and extending existing models with exponential family theory, we show how to maximize the likelihood of HMoGs with expectation-maximization. We apply HMoGs to synthetic data and RNA sequencing data, and demonstrate how they exceed the limitations of two-stage models. Ultimately, HMoGs are a rigorous generalization of a common statistical framework, and provide researchers with a method to improve model performance when clustering high-dimensional data.

READ FULL TEXT
research
05/06/2018

Branching embedding: A heuristic dimensionality reduction algorithm based on hierarchical clustering

This paper proposes a new dimensionality reduction algorithm named branc...
research
07/05/2021

Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering

Random dimensionality reduction is a versatile tool for speeding up algo...
research
02/23/2022

Human Motion Detection Using Sharpened Dimensionality Reduction and Clustering

Sharpened dimensionality reduction (SDR), which belongs to the class of ...
research
05/21/2016

Learning From Hidden Traits: Joint Factor Analysis and Latent Clustering

Dimensionality reduction techniques play an essential role in data analy...
research
08/02/2022

Cluster Weighted Model Based on TSNE algorithm for High-Dimensional Data

Similar to many Machine Learning models, both accuracy and speed of the ...
research
04/28/2022

Representative period selection for power system planning using autoencoder-based dimensionality reduction

Power sector capacity expansion models (CEMs) that are used for studying...
research
02/15/2018

Natural data structure extracted from neighborhood-similarity graphs

'Big' high-dimensional data are commonly analyzed in low-dimensions, aft...

Please sign up or login with your details

Forgot password? Click here to reset