The Shape of Data and Probability Measures

We introduce the notion of multiscale covariance tensor fields (CTF) associated with Euclidean random variables as a gateway to the shape of their distributions. Multiscale CTFs quantify variation of the data about every point in the data landscape at all spatial scales, unlike the usual covariance tensor that only quantifies global variation about the mean. Empirical forms of localized covariance previously have been used in data analysis and visualization, but we develop a framework for the systematic treatment of theoretical questions and computational models based on localized covariance. We prove strong stability theorems with respect to the Wasserstein distance between probability measures, obtain consistency results, as well as estimates for the rate of convergence of empirical CTFs. These results ensure that CTFs are robust to sampling, noise and outliers. We provide numerous illustrations of how CTFs let us extract shape from data and also apply CTFs to manifold clustering, the problem of categorizing data points according to their noisy membership in a collection of possibly intersecting, smooth submanifolds of Euclidean space. We prove that the proposed manifold clustering method is stable and carry out several experiments to validate the method.

READ FULL TEXT
research
10/12/2021

Tangent Space and Dimension Estimation with the Wasserstein Distance

We provide explicit bounds on the number of sample points required to es...
research
11/26/2018

Multiscale geometric feature extraction for high-dimensional and non-Euclidean data with application

A method for extracting multiscale geometric features from a data cloud ...
research
02/04/2020

Optimal quantization of the mean measure and application to clustering of measures

This paper addresses the case where data come as point sets, or more gen...
research
05/16/2016

Probing the Geometry of Data with Diffusion Fréchet Functions

Many complex ecosystems, such as those formed by multiple microbial taxa...
research
10/11/2019

Fitting a manifold of large reach to noisy data

Let M⊂R^n be a C^2-smooth compact submanifold of dimension d. Assume tha...
research
04/06/2019

Local Regularization of Noisy Point Clouds: Improved Global Geometric Estimates and Data Analysis

Several data analysis techniques employ similarity relationships between...
research
07/25/2022

Orthogonalization of data via Gromov-Wasserstein type feedback for clustering and visualization

In this paper we propose an adaptive approach for clustering and visuali...

Please sign up or login with your details

Forgot password? Click here to reset