Likelihood-free hypothesis testing

11/02/2022
by   Patrik Róbert Gerber, et al.
0

Consider the problem of testing Z ∼ℙ^⊗ m vs Z ∼ℚ^⊗ m from m samples. Generally, to achieve a small error rate it is necessary and sufficient to have m ≍ 1/ϵ^2, where ϵ measures the separation between ℙ and ℚ in total variation (𝖳𝖵). Achieving this, however, requires complete knowledge of the distributions ℙ and ℚ and can be done, for example, using the Neyman-Pearson test. In this paper we consider a variation of the problem, which we call likelihood-free (or simulation-based) hypothesis testing, where access to ℙ and ℚ (which are a priori only known to belong to a large non-parametric family 𝒫) is given through n iid samples from each. We demostrate existence of a fundamental trade-off between n and m given by nm ≍ n^2_𝖦𝗈𝖥(ϵ,𝒫), where n_𝖦𝗈𝖥 is the minimax sample complexity of testing between the hypotheses H_0: ℙ= ℚ vs H_1: 𝖳𝖵(ℙ,ℚ) ≥ϵ. We show this for three non-parametric families P: β-smooth densities over [0,1]^d, the Gaussian sequence model over a Sobolev ellipsoid, and the collection of distributions 𝒫 on a large alphabet [k] with pmfs bounded by c/k for fixed c. The test that we propose (based on the L^2-distance statistic of Ingster) simultaneously achieves all points on the tradeoff curve for these families. In particular, when m≫ 1/ϵ^2 our test requires the number of simulation samples n to be orders of magnitude smaller than what is needed for density estimation with accuracy ≍ϵ (under 𝖳𝖵). This demonstrates the possibility of testing without fully estimating the distributions.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro