Distributional Results for Model-Based Intrinsic Dimension Estimators

04/28/2021
by   Francesco Denti, et al.
0

Modern datasets are characterized by a large number of features that may conceal complex dependency structures. To deal with this type of data, dimensionality reduction techniques are essential. Numerous dimensionality reduction methods rely on the concept of intrinsic dimension, a measure of the complexity of the dataset. In this article, we first review the TWO-NN model, a likelihood-based intrinsic dimension estimator recently introduced in the literature. The TWO-NN estimator is based on the statistical properties of the ratio of the distances between a point and its first two nearest neighbors, assuming that the points are a realization from an homogeneous Poisson point process. We extend the TWO-NN theoretical framework by providing novel distributional results of consecutive and generic ratios of distances. These distributional results are then employed to derive intrinsic dimension estimators, called Cride and Gride. These novel estimators are more robust to noisy measurements than the TWO-NN and allow the study of the evolution of the intrinsic dimension as a function of the scale used to analyze the dataset. We discuss the properties of the different estimators with the help of simulation scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2021

intRinsic: an R package for model-based estimation of the intrinsic dimension of a dataset

The estimation of the intrinsic dimension of a dataset is a fundamental ...
research
06/23/2020

ABID: Angle Based Intrinsic Dimensionality

The intrinsic dimensionality refers to the “true” dimensionality of the ...
research
11/08/2017

Dimension Estimation Using Random Connection Models

Information about intrinsic dimension is crucial to perform dimensionali...
research
03/15/2012

Regularized Maximum Likelihood for Intrinsic Dimension Estimation

We propose a new method for estimating the intrinsic dimension of a data...
research
02/04/2019

What is the dimension of your binary data?

Many 0/1 datasets have a very large number of variables; on the other ha...
research
09/29/2022

Intrinsic Dimensionality Estimation within Tight Localities: A Theoretical and Experimental Analysis

Accurate estimation of Intrinsic Dimensionality (ID) is of crucial impor...
research
04/12/2019

Geometry-Aware Maximum Likelihood Estimation of Intrinsic Dimension

The existing approaches to intrinsic dimension estimation usually are no...

Please sign up or login with your details

Forgot password? Click here to reset