Revised Conditional t-SNE: Looking Beyond the Nearest Neighbors

02/07/2023
by   Edith Heiter, et al.
0

Conditional t-SNE (ct-SNE) is a recent extension to t-SNE that allows removal of known cluster information from the embedding, to obtain a visualization revealing structure beyond label information. This is useful, for example, when one wants to factor out unwanted differences between a set of classes. We show that ct-SNE fails in many realistic settings, namely if the data is well clustered over the labels in the original high-dimensional space. We introduce a revised method by conditioning the high-dimensional similarities instead of the low-dimensional similarities and storing within- and across-label nearest neighbors separately. This also enables the use of recently proposed speedups for t-SNE, improving the scalability. From experiments on synthetic data, we find that our proposed method resolves the considered problems and improves the embedding quality. On real data containing batch effects, the expected improvement is not always there. We argue revised ct-SNE is preferable overall, given its improved scalability. The results also highlight new open questions, such as how to handle distance variations between clusters.

READ FULL TEXT
research
05/24/2019

Conditional t-SNE: Complementary t-SNE embeddings through factoring out prior information

Dimensionality reduction and manifold learning methods such as t-Distrib...
research
01/13/2022

How I learned to stop worrying and love the curse of dimensionality: an appraisal of cluster validation in high-dimensional spaces

The failure of the Euclidean norm to reliably distinguish between nearby...
research
05/05/2014

K-NS: Section-Based Outlier Detection in High Dimensional Space

Finding rare information hidden in a huge amount of data from the Intern...
research
11/15/2019

Batch correction of high-dimensional data

Biomedical research often produces high-dimensional data confounded by b...
research
11/30/2022

High-Dimensional Wide Gap k-Means Versus Clustering Axioms

Kleinberg's axioms for distance based clustering proved to be contradict...
research
09/22/2021

Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings

t-SNE is an embedding method that the data science community has widely ...

Please sign up or login with your details

Forgot password? Click here to reset