Data Amplification: Instance-Optimal Property Estimation

03/04/2019
by   Yi Hao, et al.
0

The best-known and most commonly used distribution-property estimation technique uses a plug-in estimator, with empirical frequency replacing the underlying distribution. We present novel linear-time-computable estimators that significantly "amplify" the effective amount of data available. For a large variety of distribution properties including four of the most popular ones and for every underlying distribution, they achieve the accuracy that the empirical-frequency plug-in estimators would attain using a logarithmic-factor more samples. Specifically, for Shannon entropy and a broad class of properties including ℓ_1-distance, the new estimators use n samples to achieve the accuracy attained by the empirical estimators with n n samples. For support-size and coverage, the new estimators use n samples to achieve the performance of empirical frequency with sample size n times the logarithm of the property value. Significantly strengthening the traditional min-max formulation, these results hold not only for the worst distributions, but for each and every underlying distribution. Furthermore, the logarithmic amplification factors are optimal. Experiments on a wide variety of distributions show that the new estimators outperform the previous state-of-the-art estimators designed for each specific property.

READ FULL TEXT
research
03/29/2019

Data Amplification: A Unified and Competitive Approach to Property Estimation

Estimating properties of discrete distributions is a fundamental problem...
research
06/10/2019

The Broad Optimality of Profile Maximum Likelihood

We study three fundamental statistical-learning problems: distribution e...
research
05/21/2019

Efficient Profile Maximum Likelihood for Universal Symmetric Property Estimation

Estimating symmetric properties of a distribution, e.g. support size, co...
research
11/08/2019

Unified Sample-Optimal Property Estimation in Near-Linear Time

We consider the fundamental learning problem of estimating properties of...
research
06/25/2019

Distribution-robust mean estimation via smoothed random perturbations

We consider the problem of mean estimation assuming only finite variance...
research
01/19/2015

Structure Learning in Bayesian Networks of Moderate Size by Efficient Sampling

We study the Bayesian model averaging approach to learning Bayesian netw...
research
07/06/2018

Outperforming Good-Turing: Preliminary Report

Estimating a large alphabet probability distribution from a limited numb...

Please sign up or login with your details

Forgot password? Click here to reset