Estimating Entropy of Distributions in Constant Space
We consider the task of estimating the entropy of k-ary distributions from samples in the streaming model, where space is limited. Our main contribution is an algorithm that requires O(k log (1/ε)^2/ε^3) samples and a constant O(1) memory words of space and outputs a ±ε estimate of H(p). Without space limitations, the sample complexity has been established as S(k,ε)=Θ(k/εlog k+log^2 k/ε^2), which is sub-linear in the domain size k, and the current algorithms that achieve optimal sample complexity also require nearly-linear space in k. Our algorithm partitions [0,1] into intervals and estimates the entropy contribution of probability values in each interval. The intervals are designed to trade off the bias and variance of these estimates.
READ FULL TEXT