Variable selection in linear regression models: choosing the best subset is not always the best choice

02/23/2023
by   Moritz Hanke, et al.
0

Variable selection in linear regression settings is a much discussed problem. Best subset selection (BSS) is often considered the intuitive 'gold standard', with its use being restricted only by its NP-hard nature. Alternatives such as the least absolute shrinkage and selection operator (Lasso) or the elastic net (Enet) have become methods of choice in high-dimensional settings. A recent proposal represents BSS as a mixed integer optimization problem so that much larger problems have become feasible in reasonable computation time. We present an extensive neutral comparison assessing the variable selection performance, in linear regressions, of BSS compared to forward stepwise selection (FSS), Lasso and Enet. The simulation study considers a wide range of settings that are challenging with regard to dimensionality (with respect to the number of observations and variables), signal-to-noise ratios and correlations between predictors. As main measure of performance, we used the best possible F1-score for each method to ensure a fair comparison irrespective of any criterion for choosing the tuning parameters, and results were confirmed by alternative performance measures. Somewhat surprisingly, it was only in settings where the signal-to-noise ratio was high and the variables were (nearly) uncorrelated that BSS reliably outperformed the other methods, even in low-dimensional settings. Further, the FSS's performance was nearly identical to BSS. Our results shed new light on the usual presumption of BSS being, in principle, the best choice for variable selection. Especially for correlated variables, alternatives like Enet are faster and appear to perform better in practical settings.

READ FULL TEXT

page 9

page 14

page 15

page 16

page 17

page 18

page 19

page 28

research
10/11/2017

Variable Selection in Restricted Linear Regression Models

The use of prior information in the linear regression is well known to p...
research
11/05/2010

The Loss Rank Criterion for Variable Selection in Linear Regression Analysis

Lasso and other regularization procedures are attractive methods for var...
research
07/27/2017

Extended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso

In exciting new work, Bertsimas et al. (2016) showed that the classical ...
research
01/16/2023

Tale of two c(omplex)ities

For decades, best subset selection (BSS) has eluded statisticians mainly...
research
08/10/2017

Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is low

We study the behavior of a fundamental tool in sparse statistical modeli...
research
01/05/2022

High-dimensional variable selection with heterogeneous signals: A precise asymptotic perspective

We study the problem of exact support recovery for high-dimensional spar...
research
07/14/2021

On the early solution path of best subset selection

The early solution path, which tracks the first few variables that enter...

Please sign up or login with your details

Forgot password? Click here to reset