Minimax Quasi-Bayesian estimation in sparse canonical correlation analysis via a Rayleigh quotient function
Canonical correlation analysis (CCA) is a popular statistical technique for exploring the relationship between datasets. The estimation of sparse canonical correlation vectors has emerged in recent years as an important but challenging variation of the CCA problem, with widespread applications. Currently available rate-optimal estimators for sparse canonical correlation vectors are expensive to compute. We propose a quasi-Bayesian estimation procedure that achieves the minimax estimation rate, and yet is easy to compute by Markov Chain Monte Carlo (MCMC). The method builds on ([37]) and uses a re-scaled Rayleigh quotient function as a quasi-log-likelihood. However unlike these authors, we adopt a Bayesian framework that combines this quasi-log-likelihood with a spike-and-slab prior that serves to regularize the inference and promote sparsity. We investigated the empirical behavior of the proposed method on both continuous and truncated data, and we noted that it outperforms several state-of-the-art methods. As an application, we use the methodology to maximally correlate clinical variables and proteomic data for a better understanding of covid-19 disease.
READ FULL TEXT