A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction

05/30/2023
by   Guillaume Huguet, et al.
0

Diffusion-based manifold learning methods have proven useful in representation learning and dimensionality reduction of modern high dimensional, high throughput, noisy datasets. Such datasets are especially present in fields like biology and physics. While it is thought that these methods preserve underlying manifold structure of data by learning a proxy for geodesic distances, no specific theoretical links have been established. Here, we establish such a link via results in Riemannian geometry explicitly connecting heat diffusion to manifold distances. In this process, we also formulate a more general heat kernel based manifold embedding method that we call heat geodesic embeddings. This novel perspective makes clearer the choices available in manifold learning and denoising. Results show that our method outperforms existing state of the art in preserving ground truth manifold distances, and preserving cluster structure in toy datasets. We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure, where our method enables interpolation of withheld timepoints of data. Finally, we show that parameters of our more general method can be configured to give results similar to PHATE (a state-of-the-art diffusion based manifold learning method) as well as SNE (an attraction/repulsion neighborhood based method that forms the basis of t-SNE).

READ FULL TEXT

page 20

page 28

research
07/03/2023

Supervised Manifold Learning via Random Forest Geometry-Preserving Proximities

Manifold learning approaches seek the intrinsic, low-dimensional data st...
research
11/02/2022

Geodesic Sinkhorn: optimal transport for high-dimensional datasets

Understanding the dynamics and reactions of cells from population snapsh...
research
10/05/2020

Learning Manifold Implicitly via Explicit Heat-Kernel Learning

Manifold learning is a fundamental problem in machine learning with nume...
research
02/25/2021

Diffusion Earth Mover's Distance and Distribution Embeddings

We propose a new fast method of measuring distances between large number...
research
12/08/2014

Web image annotation by diffusion maps manifold learning algorithm

Automatic image annotation is one of the most challenging problems in ma...
research
06/20/2012

Statistical Translation, Heat Kernels and Expected Distances

High dimensional structured data such as text and images is often poorly...
research
01/05/2020

Multi-Objective Genetic Programming for Manifold Learning: Balancing Quality and Dimensionality

Manifold learning techniques have become increasingly valuable as data c...

Please sign up or login with your details

Forgot password? Click here to reset