On the expected runtime of multiple testing algorithms with bounded error
Consider the testing of multiple hypotheses in the setting where the p-values of all hypotheses are unknown and thus have to be approximated using Monte Carlo simulations. One class of algorithms published in the literature for this scenario ensures guarantees on the correctness of their testing result (for instance, a bound on the resampling risk) through the computation of confidence statements on all approximated p-values. This article focuses on the expected runtime of those algorithms and shows the following four main results. Computing a decision on a single hypothesis tested at a fixed threshold requires an infinite expected runtime. In applications relying on the decisions of multiple hypotheses computed with a Bonferroni-type threshold, all but two hypotheses can be decided in finite expected runtime. This result does not extend to applications which require full knowledge of all individual decisions (for instance, step-up or step-down procedures), in which case no algorithm can guarantee even a single decision in finite expected runtime. Nevertheless simulations show that in practice, the number of pending decisions typically remains low.
READ FULL TEXT