Distributions associated with simultaneous multiple hypothesis testing

02/25/2018

∙

We develop the distribution of the number of hypotheses found to be statistically significant using the rule from Benjamini and Hochberg (1995) for controlling the false discovery rate (FDR). This distribution has both a small sample form and an asymptotic expression for testing many independent hypotheses simultaneously. We propose a parametric distribution Ψ_I(·) to approximate the marginal distribution of p-values under a non-uniform alternative hypothesis. This distribution is useful when there are many different alternative hypotheses and these are not individually well understood. We fit Ψ_I to data from three cancer studies and use it to illustrate the distribution of the number of notable hypotheses observed in these examples. We model dependence of sampled p-values using a copula model and a latent variable approach. These methods can be combined to illustrate a power analysis in planning a large study on the basis of a smaller pilot study. We show the number of statistically significant p-values behaves approximately as a mixture of a normal and the Borel-Tanner distribution.

READ FULL TEXT

Distributions associated with simultaneous multiple hypothesis testing

Sign in with Google

Consider DeepAI Pro