Perturbation Analysis of Randomized SVD and its Applications to High-dimensional Statistics
Randomized singular value decomposition (RSVD) is a class of computationally efficient algorithms for computing the truncated SVD of large data matrices. Given a n × n symmetric matrix 𝐌, the prototypical RSVD algorithm outputs an approximation of the k leading singular vectors of 𝐌 by computing the SVD of 𝐌^g𝐆; here g ≥ 1 is an integer and 𝐆∈ℝ^n × k is a random Gaussian sketching matrix. In this paper we study the statistical properties of RSVD under a general "signal-plus-noise" framework, i.e., the observed matrix 𝐌̂ is assumed to be an additive perturbation of some true but unknown signal matrix 𝐌. We first derive upper bounds for the ℓ_2 (spectral norm) and ℓ_2→∞ (maximum row-wise ℓ_2 norm) distances between the approximate singular vectors of 𝐌̂ and the true singular vectors of the signal matrix 𝐌. These upper bounds depend on the signal-to-noise ratio (SNR) and the number of power iterations g. A phase transition phenomenon is observed in which a smaller SNR requires larger values of g to guarantee convergence of the ℓ_2 and ℓ_2→∞ distances. We also show that the thresholds for g where these phase transitions occur are sharp whenever the noise matrices satisfy a certain trace growth condition. Finally, we derive normal approximations for the row-wise fluctuations of the approximate singular vectors and the entrywise fluctuations of the approximate matrix. We illustrate our theoretical results by deriving nearly-optimal performance guarantees for RSVD when applied to three statistical inference problems, namely, community detection, matrix completion, and principal component analysis with missing data.
READ FULL TEXT