Efficient Clustering of Correlated Variables and Variable Selection in High-Dimensional Linear Models

03/11/2016
by   Niharika Gauraha, et al.
0

In this paper, we introduce Adaptive Cluster Lasso(ACL) method for variable selection in high dimensional sparse regression models with strongly correlated variables. To handle correlated variables, the concept of clustering or grouping variables and then pursuing model fitting is widely accepted. When the dimension is very high, finding an appropriate group structure is as difficult as the original problem. The ACL is a three-stage procedure where, at the first stage, we use the Lasso(or its adaptive or thresholded version) to do initial selection, then we also include those variables which are not selected by the Lasso but are strongly correlated with the variables selected by the Lasso. At the second stage we cluster the variables based on the reduced set of predictors and in the third stage we perform sparse estimation such as Lasso on cluster representatives or the group Lasso based on the structures generated by clustering procedure. We show that our procedure is consistent and efficient in finding true underlying population group structure(under assumption of irrepresentable and beta-min conditions). We also study the group selection consistency of our method and we support the theory using simulated and pseudo-real dataset examples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2021

A Two-Stage Variable Selection Approach for Correlated High Dimensional Predictors

When fitting statistical models, some predictors are often found to be c...
research
09/10/2012

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

Penalized regression is an attractive framework for variable selection p...
research
08/07/2020

Grouping effects of sparse CCA models in variable selection

The sparse canonical correlation analysis (SCCA) is a bi-multivariate as...
research
08/15/2020

Ultra high dimensional generalized additive model: Unified Theory and Methods

Generalized additive model is a powerful statistical learning and predic...
research
03/28/2022

A Comparison of Hamming Errors of Representative Variable Selection Methods

Lasso is a celebrated method for variable selection in linear models, bu...
research
08/18/2020

Clustering of variables for enhanced interpretability of predictive models

A new strategy is proposed for building easy to interpret predictive mod...
research
12/21/2021

Group Lasso merger for sparse prediction with high-dimensional categorical data

Sparse prediction with categorical data is challenging even for a modera...

Please sign up or login with your details

Forgot password? Click here to reset