Fast multivariate empirical cumulative distribution function with connection to kernel density estimation

05/07/2020
by   Nicolas Langrené, et al.
0

This paper revisits the problem of computing empirical cumulative distribution functions (ECDF) efficiently on large, multivariate datasets. Computing an ECDF at one evaluation point requires 𝒪(N) operations on a dataset composed of N data points. Therefore, a direct evaluation of ECDFs at N evaluation points requires a quadratic 𝒪(N^2) operations, which is prohibitive for large-scale problems. Two fast and exact methods are proposed and compared. The first one is based on fast summation in lexicographical order, with a 𝒪(NlogN) complexity and requires the evaluation points to lie on a regular grid. The second one is based on the divide-and-conquer principle, with a 𝒪(Nlog(N)^(d-1)∨1) complexity and requires the evaluation points to coincide with the input points. The two fast algorithms are described and detailed in the general d-dimensional case, and numerical experiments validate their speed and accuracy. Secondly, the paper establishes a direct connection between cumulative distribution functions and kernel density estimation (KDE) for a large class of kernels. This connection paves the way for fast exact algorithms for multivariate kernel density estimation and kernel regression. Numerical tests with the Laplacian kernel validate the speed and accuracy of the proposed algorithms. A broad range of large-scale multivariate density estimation, cumulative distribution estimation, survival function estimation and regression problems can benefit from the proposed numerical methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/04/2017

Fast and stable multivariate kernel density estimation by fast sum updating

Kernel density estimation and kernel regression are powerful but computa...
research
06/02/2018

Fast Exact Univariate Kernel Density Estimation

This paper presents new methodology for computationally efficient kernel...
research
04/16/2022

FKreg: A MATLAB toolbox for fast Multivariate Kernel Regression

Kernel smooth is the most fundamental technique for data density and reg...
research
11/21/2008

Kernel Regression by Mode Calculation of the Conditional Probability Distribution

The most direct way to express arbitrary dependencies in datasets is to ...
research
06/03/2020

Plots of the cumulative differences between observed and expected values of ordered Bernoulli variates

Many predictions are probabilistic in nature; for example, a prediction ...
research
06/08/2016

Fast and Extensible Online Multivariate Kernel Density Estimation

We present xokde++, a state-of-the-art online kernel density estimation ...
research
05/19/2022

Metrics of calibration for probabilistic predictions

Predictions are often probabilities; e.g., a prediction could be for pre...

Please sign up or login with your details

Forgot password? Click here to reset