Where to find needles in a haystack?
In many existing methods in multiple comparison, one starts with either Fisher's p-values or the local fdr scores. The former one, with a usual definition as the tail probability exceeding the observed test statistic under the null distribution, fails to use the information from the alternative hypothesis and the targeted region of signals could be completely wrong especially when the likelihood ratio function is not monotone. The local fdr based approaches, usually relying on the density functions, are optimal oracally. However, the targeted region of the signals of the data-driven version is problematic because of the slow convergence of the non-parametric density estimation especially on the boundaries. In this paper, we propose a new method: Cdf and Local fdr Assisted multiple Testing method (CLAT), which is optimal for cases when the p-values based method are not. Additionally, the data-driven version only relies on the estimation of the cumulative distribution function and converges to the oracle version quickly. Both simulations and real data analysis demonstrate the superior performance of the proposed method than the existing ones. Furthermore, the computation is instantaneous based on a novel algorithm and is scalable to the large data set.
READ FULL TEXT