Robust subset selection

05/17/2020
by   Ryan Thompson, et al.
0

The best subset selection (or "best subsets") estimator is a classic tool for sparse regression, and developments in mathematical optimization over the past decade have made it more computationally tractable than ever. Notwithstanding its desirable statistical properties, the best subsets estimator is susceptible to outliers and can break down in the presence of a single contaminated data point. To address this issue, we propose a robust adaption of best subsets that is highly resistant to contamination in both the response and the predictors. Our estimator generalizes the notion of subset selection to both predictors and observations, thereby achieving robustness in addition to sparsity. This procedure, which we call "robust subset selection" (or "robust subsets"), is defined by a combinatorial optimization problem for which we apply modern discrete optimization methods. We formally establish the robustness of our estimator in terms of the finite-sample breakdown point of its objective value. In support of this result, we report experiments on both synthetic and real data that demonstrate the superiority of robust subsets over best subsets in the presence of contamination. Importantly, robust subsets fares competitively across several metrics compared with popular robust adaptions of the Lasso.

READ FULL TEXT

page 20

page 21

page 22

page 23

page 25

research
08/10/2017

Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is low

We study the behavior of a fundamental tool in sparse statistical modeli...
research
04/18/2023

The Adaptive τ-Lasso: Its Robustness and Oracle Properties

This paper introduces a new regularized version of the robust τ-regressi...
research
10/18/2022

A Note on Robust Subsets of Transversal Matroids

Robust subsets of matroids were introduced by Huang and Sellier to propo...
research
06/20/2022

Measuring the Effect of Training Data on Deep Learning Predictions via Randomized Experiments

We develop a new, principled algorithm for estimating the contribution o...
research
09/14/2021

Rationales for Sequential Predictions

Sequence models are a critical component of modern NLP systems, but thei...
research
09/14/2022

The Fragility of Multi-Treebank Parsing Evaluation

Treebank selection for parsing evaluation and the spurious effects that ...
research
09/20/2016

Robust Estimation of Multiple Inlier Structures

The robust estimator presented in this paper processes each structure in...

Please sign up or login with your details

Forgot password? Click here to reset