Diffusion Earth Mover's Distance and Distribution Embeddings

02/25/2021
by   Alexander Tong, et al.
0

We propose a new fast method of measuring distances between large numbers of related high dimensional datasets called the Diffusion Earth Mover's Distance (EMD). We model the datasets as distributions supported on common data graph that is derived from the affinity matrix computed on the combined data. In such cases where the graph is a discretization of an underlying Riemannian closed manifold, we prove that Diffusion EMD is topologically equivalent to the standard EMD with a geodesic ground distance. Diffusion EMD can be computed in Õ(n) time and is more accurate than similarly fast algorithms such as tree-based EMDs. We also show Diffusion EMD is fully differentiable, making it amenable to future uses in gradient-descent frameworks such as deep neural networks. Finally, we demonstrate an application of Diffusion EMD to single cell data collected from 210 COVID-19 patient samples at Yale New Haven Hospital. Here, Diffusion EMD can derive distances between patients on the manifold of cells at least two orders of magnitude faster than equally accurate methods. This distance matrix between patients can be embedded into a higher level patient manifold which uncovers structure and heterogeneity in patients. More generally, Diffusion EMD is applicable to all datasets that are massively collected in parallel in many medical and biological systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2021

Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance

In modern relational machine learning it is common to encounter large gr...
research
03/07/2020

Diffusion State Distances: Multitemporal Analysis, Fast Algorithms, and Applications to Biological Networks

Data-dependent metrics are powerful tools for learning the underlying st...
research
04/01/2023

Diffusion map particle systems for generative modeling

We propose a novel diffusion map particle system (DMPS) for generative m...
research
05/30/2023

A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction

Diffusion-based manifold learning methods have proven useful in represen...
research
12/23/2022

Your diffusion model secretly knows the dimension of the data manifold

In this work, we propose a novel framework for estimating the dimension ...
research
07/20/2023

Fisher-Rao distance and pullback SPD cone distances between multivariate normal distributions

Data sets of multivariate normal distributions abound in many scientific...
research
07/03/2017

People Mover's Distance: Class level geometry using fast pairwise data adaptive transportation costs

We address the problem of defining a network graph on a large collection...

Please sign up or login with your details

Forgot password? Click here to reset