Accounting for Significance and Multicollinearity in Building Linear Regression Models
We derive explicit Mixed Integer Optimization (MIO) constraints, as opposed to iteratively imposing them in a cutting plane framework, that impose significance and avoid multicollinearity for building linear regression models. In this way we extend and improve the research program initiated in Bertsimas and King (2016) that imposes sparsity, robustness, pairwise collinearity and group sparsity explicitly and significance and avoiding multicollinearity iteratively. We present a variety of computational results on real and synthetic datasets that suggest that the proposed MIO has a significant computational edge compared to Bertsimas and King (2016) in accuracy, false detection rate and computational time in accounting for significance and multicollinearity as well as providing a holistic framework to produce regression models with desirable properties a priori.
READ FULL TEXT