Convergence of Smoothed Empirical Measures with Applications to Entropy Estimation

05/30/2019
by   Ziv Goldfeld, et al.
0

This paper studies convergence of empirical measures smoothed by a Gaussian kernel. Specifically, consider approximating P∗N_σ, for N_σN(0,σ^2 I_d), by P̂_n∗N_σ, where P̂_n is the empirical measure, under different statistical distances. The convergence is examined in terms of the Wasserstein distance, total variation (TV), Kullback-Leibler (KL) divergence, and χ^2-divergence. We show that the approximation error under the TV distance and 1-Wasserstein distance (W_1) converges at rate e^O(d)n^-1/2 in remarkable contrast to a typical n^-1/d rate for unsmoothed W_1 (and d> 3). For the KL divergence, squared 2-Wasserstein distance (W_2^2), and χ^2-divergence, the convergence rate is e^O(d)n^-1, but only if P achieves finite input-output χ^2 mutual information across the additive white Gaussian noise channel. If the latter condition is not met, the rate changes to ω(n^-1) for the KL divergence and W_2^2, while the χ^2-divergence becomes infinite - a curious dichotomy. As a main application we consider estimating the differential entropy h(P∗N_σ) in the high-dimensional regime. The distribution P is unknown but n i.i.d samples from it are available. We first show that any good estimator of h(P∗N_σ) must have sample complexity that is exponential in d. Using the empirical approximation results we then show that the absolute-error risk of the plug-in estimator converges at the parametric rate e^O(d)n^-1/2, thus establishing the minimax rate-optimality of the plug-in. Numerical results that demonstrate a significant empirical superiority of the plug-in approach to general-purpose differential entropy estimators are provided.

READ FULL TEXT
research
02/24/2022

Tighter Expected Generalization Error Bounds via Convexity of Information Measures

Generalization error bounds are essential to understanding machine learn...
research
01/05/2021

Convergence and finite sample approximations of entropic regularized Wasserstein distances in Gaussian and RKHS settings

This work studies the convergence and finite sample approximations of en...
research
06/21/2023

Entropic characterization of optimal rates for learning Gaussian mixtures

We consider the question of estimating multi-dimensional Gaussian mixtur...
research
02/03/2020

Limit Distribution for Smooth Total Variation and χ^2-Divergence in High Dimensions

Statistical divergences are ubiquitous in machine learning as tools for ...
research
05/04/2022

Rate of convergence of the smoothed empirical Wasserstein distance

Consider an empirical measure ℙ_n induced by n iid samples from a d-dime...
research
05/19/2017

Relaxed Wasserstein with Applications to GANs

We propose a novel class of statistical divergences called Relaxed Wasse...
research
10/28/2018

On Learning Markov Chains

The problem of estimating an unknown discrete distribution from its samp...

Please sign up or login with your details

Forgot password? Click here to reset