COMBSS: Best Subset Selection via Continuous Optimization
We consider the problem of best subset selection in linear regression, where the goal is to find for every model size k, that subset of k features that best fit the response. This is particularly challenging when the total available number of features is very large compared to the number of data samples. We propose COMBSS, a novel continuous optimization based method that identifies a solution path, a small set of models of varying size, that consists of candidates for the best subset in linear regression. COMBSS turns out to be very fast, making subset selection possible when the number of features is well in excess of thousands. Simulation results are presented to highlight the performance of COMBSS in comparison to existing popular methods such as Forward Stepwise, the Lasso and Mixed-Integer Optimization. Because of the outstanding overall performance, framing the best subset selection challenge as a continuous optimization problem opens new research directions for feature extraction for a large variety of regression models.
READ FULL TEXT 
  
  
     share
 share