A Distribution-Free Test of Independence and Its Application to Variable Selection
Motivated by the importance of measuring the association between the response and predictors in high dimensional data, In this article, we propose a new mean variance test of independence between a categorical random variable and a continuous one based on mean variance index. The mean variance index is zero if and only if two variables are independent. Under the independence, we derive an explicit form of its asymptotic null distribution, which provides us with an efficient and fast way to compute the empirical p-value in practice. The number of classes of the categorical variable is allowed to diverge slowly to the infinity. It is essentially a rank test and thus distribution-free. No assumption on the distributions of two random variables is required and the test statistic is invariant under one-to-one transformations. It is resistent to heavy-tailed distributions and extreme values. We assess its performance by Monte Carlo simulations and demonstrate that the proposed test achieves a higher power in comparison with the existing tests. We apply the proposed MV test to a high dimensional colon cancer gene expression data to detect the significant genes associated with the tissue syndrome.
READ FULL TEXT