Generalized R^2 Measures for a Mixture of Bivariate Linear Dependences
Motivated by the pressing needs for capturing complex but interperetable variable relationships in scientific research, here we develop new mathematical foundation and statistical methodologies to generalize the squared Pearson correlation, i.e., the R^2, to capture a mixture of linear dependences between two real-valued random variables. We define the population and sample generalized R^2 measures under the supervised and unsupervised scenarios, and we derive the asymptotic distributions of the sample measures to enable computationally efficient statistical inference of the population measures. To compute the sample generalized R^2 measure under the unsupervised scenario, we develop a K-lines clustering algorithm and investigate its connection to gradient descent and expectation-maximization algorithms. Our simulation results provide additional numerical verification of the theoretical results. Two real data genomic applications demonstrate the effectiveness of the generalized R^2 measures in capturing interpretable gene-gene relationships that are likely missed by existing association measures. The estimation and inference procedures are implemented in an R package gR2.
READ FULL TEXT