T-SNE Is Not Optimized to Reveal Clusters in Data

10/06/2021
by   Zhirong Yang, et al.
7

Cluster visualization is an essential task for nonlinear dimensionality reduction as a data analysis tool. It is often believed that Student t-Distributed Stochastic Neighbor Embedding (t-SNE) can show clusters for well clusterable data, with a smaller Kullback-Leibler divergence corresponding to a better quality. There was even theoretical proof for the guarantee of this property. However, we point out that this is not necessarily the case – t-SNE may leave clustering patterns hidden despite strong signals present in the data. Extensive empirical evidence is provided to support our claim. First, several real-world counter-examples are presented, where t-SNE fails even if the input neighborhoods are well clusterable. Tuning hyperparameters in t-SNE or using better optimization algorithms does not help solve this issue because a better t-SNE learning objective can correspond to a worse cluster embedding. Second, we check the assumptions in the clustering guarantee of t-SNE and find they are often violated for real-world data sets.

READ FULL TEXT

page 7

page 8

page 10

page 12

page 13

page 14

research
08/18/2021

Stochastic Cluster Embedding

Neighbor Embedding (NE) that aims to preserve pairwise similarities betw...
research
06/10/2011

A Computational Framework for Nonlinear Dimensionality Reduction of Large Data Sets: The Exploratory Inspection Machine (XIM)

In this paper, we present a novel computational framework for nonlinear ...
research
02/09/2017

Stochastic Neighbor Embedding separates well-separated clusters

Stochastic Neighbor Embedding and its variants are widely used dimension...
research
10/30/2019

Meta-Learning to Cluster

Clustering is one of the most fundamental and wide-spread techniques in ...
research
10/04/2022

Detection and Evaluation of Clusters within Sequential Data

Motivated by theoretical advancements in dimensionality reduction techni...
research
12/16/2021

Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs

Hierarchical clustering studies a recursive partition of a data set into...
research
03/05/2018

An Analysis of the t-SNE Algorithm for Data Visualization

A first line of attack in exploratory data analysis is data visualizatio...

Please sign up or login with your details

Forgot password? Click here to reset