Entropic Wasserstein Component Analysis

03/09/2023
by   Antoine Collas, et al.
0

Dimension reduction (DR) methods provide systematic approaches for analyzing high-dimensional data. A key requirement for DR is to incorporate global dependencies among original and embedded samples while preserving clusters in the embedding space. To achieve this, we combine the principles of optimal transport (OT) and principal component analysis (PCA). Our method seeks the best linear subspace that minimizes reconstruction error using entropic OT, which naturally encodes the neighborhood information of the samples. From an algorithmic standpoint, we propose an efficient block-majorization-minimization solver over the Stiefel manifold. Our experimental results demonstrate that our approach can effectively preserve high-dimensional clusters, leading to more interpretable and effective embeddings. Python code of the algorithms and experiments is available online.

READ FULL TEXT

page 5

page 6

research
03/13/2022

Homogeneity and Sub-homogeneity Pursuit: Iterative Complement Clustering PCA

Principal component analysis (PCA), the most popular dimension-reduction...
research
02/03/2017

Intrinsic Grassmann Averages for Online Linear and Robust Subspace Learning

Principal Component Analysis (PCA) is a fundamental method for estimatin...
research
05/10/2019

Supporting Analysis of Dimensionality Reduction Results with Contrastive Learning

Dimensionality reduction (DR) is frequently used for analyzing and visua...
research
03/05/2012

Subspace clustering of high-dimensional data: a predictive approach

In several application domains, high-dimensional observations are collec...
research
03/09/2022

High Dimensional Statistical Analysis and its Application to ALMA Map of NGC 253

In astronomy, if we denote the dimension of data as d and the number of ...
research
09/24/2019

Dimension Estimation Using Autoencoders

Dimension Estimation (DE) and Dimension Reduction (DR) are two closely r...
research
12/16/2017

Taming Wild High Dimensional Text Data with a Fuzzy Lash

The bag of words (BOW) represents a corpus in a matrix whose elements ar...

Please sign up or login with your details

Forgot password? Click here to reset