Theoretical Foundations of t-SNE for Visualizing High-Dimensional Clustered Data

05/16/2021
by   T. Tony Cai, et al.
27

This study investigates the theoretical foundations of t-distributed stochastic neighbor embedding (t-SNE), a popular nonlinear dimension reduction and data visualization method. A novel theoretical framework for the analysis of t-SNE based on the gradient descent approach is presented. For the early exaggeration stage of t-SNE, we show its asymptotic equivalence to a power iteration based on the underlying graph Laplacian, characterize its limiting behavior, and uncover its deep connection to Laplacian spectral clustering, and fundamental principles including early stopping as implicit regularization. The results explain the intrinsic mechanism and the empirical benefits of such a computational strategy. For the embedding stage of t-SNE, we characterize the kinematics of the low-dimensional map throughout the iterations, and identify an amplification phase, featuring the intercluster repulsion and the expansive behavior of the low-dimensional map. The general theory explains the fast convergence rate and the exceptional empirical performance of t-SNE for visualizing clustered data, brings forth the interpretations of the t-SNE output, and provides theoretical guidance for selecting tuning parameters in various applications.

READ FULL TEXT
research
07/25/2022

Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data Visualization

Dimensionality reduction techniques aim at representing high-dimensional...
research
02/28/2022

Learning Low-Dimensional Nonlinear Structures from High-Dimensional Noisy Data: An Integral Operator Approach

We propose a kernel-spectral embedding algorithm for learning low-dimens...
research
05/23/2019

Geometric Laplacian Eigenmap Embedding

Graph embedding seeks to build a low-dimensional representation of a gra...
research
06/13/2020

Consistent Semi-Supervised Graph Regularization for High Dimensional Data

Semi-supervised Laplacian regularization, a standard graph-based approac...
research
10/25/2022

A Spectral Method for Assessing and Combining Multiple Data Visualizations

Dimension reduction and data visualization aim to project a high-dimensi...
research
05/05/2023

Random Smoothing Regularization in Kernel Gradient Descent Learning

Random smoothing data augmentation is a unique form of regularization th...
research
11/26/2013

Auto-adaptative Laplacian Pyramids for High-dimensional Data Analysis

Non-linear dimensionality reduction techniques such as manifold learning...

Please sign up or login with your details

Forgot password? Click here to reset