Controlling FDR while highlighting distinct discoveries
Often modern scientific investigations start by testing a very large number of hypotheses in an effort to comprehensively mine the data for possible discoveries. Multiplicity adjustment strategies are employed to ensure replicability of the results of this broad search. Furthermore, in many cases, discoveries are subject to a second round of filtering, where researchers select the rejected hypotheses that better represent distinct and interpretable findings for reporting and follow-up. For example, in genetic studies, one DNA variant is often chosen to represent a group of neighboring polymorphisms, all apparently associated to a trait of interest. Unfortunately the guarantees of false discovery rate (FDR) control that might be true for the initial set of findings do not translate to this second filtered set. Indeed we observe that some filters used in practice have a tendency of keeping a larger fraction of nulls than non-nulls, thereby inflating the FDR. To overcome this, we introduce Focused BH, a multiple testing procedure that accounts for the filtering step, allowing the researcher to rely on the data and on the results of testing to filter the rejection set, while assuring FDR control under a range of assumptions on the filter and the p-value dependency structure. Simulations illustrate that FDR control on the filtered set of discoveries is obtained without substantial power loss and that the procedure is robust to violations of our theoretical assumptions. Notable applications of Focused BH include control of the outer node FDR when testing hypotheses on a tree.
READ FULL TEXT