De-biased sparse PCA: Inference and testing for eigenstructure of large covariance matrices

01/31/2018
by   Jana Jankova, et al.
0

Sparse principal component analysis (sPCA) has become one of the most widely used techniques for dimensionality reduction in high-dimensional datasets. The main challenge underlying sPCA is to estimate the first vector of loadings of the population covariance matrix, provided that only a certain number of loadings are non-zero. In this paper, we propose confidence intervals for individual loadings and for the largest eigenvalue of the population covariance matrix. Given an independent sample X^i ∈ R^p, i = 1,...,n, generated from an unknown distribution with an unknown covariance matrix Σ_0, our aim is to estimate the first vector of loadings and the largest eigenvalue of Σ_0 in a setting where p≫ n. Next to the high-dimensionality, another challenge lies in the inherent non-convexity of the problem. We base our methodology on a Lasso-penalized M-estimator which, despite non-convexity, may be solved by a polynomial-time algorithm such as coordinate or gradient descent. We show that our estimator achieves the minimax optimal rates in ℓ_1 and ℓ_2-norm. We identify the bias in the Lasso-based estimator and propose a de-biased sparse PCA estimator for the vector of loadings and for the largest eigenvalue of the covariance matrix Σ_0. Our main results provide theoretical guarantees for asymptotic normality of the de-biased estimator. The major conditions we impose are sparsity in the first eigenvector of small order √(n)/ p and sparsity of the same order in the columns of the inverse Hessian matrix of the population risk.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset