Sample canonical correlation coefficients of high-dimensional random vectors: local law and Tracy-Widom limit
Consider two random vectors C_1^1/2x∈R^p and C_2^1/2y∈R^q, where the entries of x and y are i.i.d. random variables with mean zero and variance one, and C_1 and C_2 are p × p and q× q deterministic population covariance matrices. With n independent samples of (C_1^1/2x,C_2^1/2y), we study the sample correlation between these two vectors using canonical correlation analysis. We denote by S_xx and S_yy the sample covariance matrices for C_1^1/2x and C_2^1/2y, respectively, and S_xy the sample cross-covariance matrix. Then the sample canonical correlation coefficients are the square roots of the eigenvalues of the sample canonical correlation matrix C_XY:=S_xx^-1S_xyS_yy^-1S_yx. Under the high-dimensional setting with p/n→ c_1 ∈ (0, 1) and q/n→ c_2 ∈ (0, 1-c_1) as n→∞, we prove that the largest eigenvalue of C_XY converges to the Tracy-Widom distribution as long as we have lim_s →∞s^4 [P(| x_ij|≥ s)+ P(| y_ij|≥ s)]=0. This extends the result in [16], which established the Tracy-Widom limit of the largest eigenvalue of C_XY under the assumption that all moments are finite. Our proof is based on a linearization method, which reduces the problem to the study of a (p+q+2n)× (p+q+2n) random matrix H. In particular, we shall prove an optimal local law on its inverse G:=H^-1, i.e the resolvent. This local law is the main tool for both the proof of the Tracy-Widom law in this paper, and the study in [22,23] on the canonical correlation coefficients of high-dimensional random vectors with finite rank correlations.
READ FULL TEXT