De-biased sparse PCA: Inference and testing for eigenstructure of large covariance matrices

01/31/2018
by   Jana Jankova, et al.
0

Sparse principal component analysis (sPCA) has become one of the most widely used techniques for dimensionality reduction in high-dimensional datasets. The main challenge underlying sPCA is to estimate the first vector of loadings of the population covariance matrix, provided that only a certain number of loadings are non-zero. In this paper, we propose confidence intervals for individual loadings and for the largest eigenvalue of the population covariance matrix. Given an independent sample X^i ∈ R^p, i = 1,...,n, generated from an unknown distribution with an unknown covariance matrix Σ_0, our aim is to estimate the first vector of loadings and the largest eigenvalue of Σ_0 in a setting where p≫ n. Next to the high-dimensionality, another challenge lies in the inherent non-convexity of the problem. We base our methodology on a Lasso-penalized M-estimator which, despite non-convexity, may be solved by a polynomial-time algorithm such as coordinate or gradient descent. We show that our estimator achieves the minimax optimal rates in ℓ_1 and ℓ_2-norm. We identify the bias in the Lasso-based estimator and propose a de-biased sparse PCA estimator for the vector of loadings and for the largest eigenvalue of the covariance matrix Σ_0. Our main results provide theoretical guarantees for asymptotic normality of the de-biased estimator. The major conditions we impose are sparsity in the first eigenvector of small order √(n)/ p and sparsity of the same order in the columns of the inverse Hessian matrix of the population risk.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2019

Tracy-Widom limit for the largest eigenvalue of high-dimensional covariance matrices in elliptical distributions

Let X be an M× N random matrices consisting of independent M-variate ell...
research
08/11/2015

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

Performing statistical inference in high-dimension is an outstanding cha...
research
12/24/2014

Inference for Sparse Conditional Precision Matrices

Given n i.i.d. observations of a random vector (X,Z), where X is a high-...
research
10/01/2018

Integrated Principal Components Analysis

Data integration, or the strategic analysis of multiple sources of data ...
research
08/28/2020

Exact and Approximation Algorithms for Sparse PCA

Sparse PCA (SPCA) is a fundamental model in machine learning and data an...
research
02/23/2012

Optimal detection of sparse principal components in high dimension

We perform a finite sample analysis of the detection levels for sparse p...
research
06/01/2016

Graph-Guided Banding of the Covariance Matrix

Regularization has become a primary tool for developing reliable estimat...

Please sign up or login with your details

Forgot password? Click here to reset