AgFlow: Fast Model Selection of Penalized PCA via Implicit Regularization Effects of Gradient Flow

by   Haiyan Jiang, et al.

Principal component analysis (PCA) has been widely used as an effective technique for feature extraction and dimension reduction. In the High Dimension Low Sample Size (HDLSS) setting, one may prefer modified principal components, with penalized loadings, and automated penalty selection by implementing model selection among these different models with varying penalties. The earlier work [1, 2] has proposed penalized PCA, indicating the feasibility of model selection in L_2- penalized PCA through the solution path of Ridge regression, however, it is extremely time-consuming because of the intensive calculation of matrix inverse. In this paper, we propose a fast model selection method for penalized PCA, named Approximated Gradient Flow (AgFlow), which lowers the computation complexity through incorporating the implicit regularization effect introduced by (stochastic) gradient flow [3, 4] and obtains the complete solution path of L_2-penalized PCA under varying L_2-regularization. We perform extensive experiments on real-world datasets. AgFlow outperforms existing methods (Oja [5], Power [6], and Shamir [7] and the vanilla Ridge estimators) in terms of computation costs.


page 1

page 2

page 3

page 4


The Stochastic Complexity of Principal Component Analysis

PCA (principal component analysis) and its variants are ubiquitous techn...

Automatic dimensionality selection for principal component analysis models with the ignorance score

Principal component analysis (PCA) is by far the most widespread tool fo...

Toroidal PCA via density ridges

Principal Component Analysis (PCA) is a well-known linear dimension-redu...

Data Distillery: Effective Dimension Estimation via Penalized Probabilistic PCA

The paper tackles the unsupervised estimation of the effective dimension...

Group Invariance and Computational Sufficiency

Statistical sufficiency formalizes the notion of data reduction. In the ...

Validation of nonlinear PCA

Linear principal component analysis (PCA) can be extended to a nonlinear...

FADI: Fast Distributed Principal Component Analysis With High Accuracy for Large-Scale Federated Data

Principal component analysis (PCA) is one of the most popular methods fo...

Please sign up or login with your details

Forgot password? Click here to reset