Robust selection of predictors and conditional outlier detection in a perturbed large-dimensional regression context
This paper presents a fast methodology, called ROBOUT, to identify outliers in a response variable conditional on a set of linearly related predictors, retrieved from a large granular dataset. ROBOUT is shown to be effective and particularly versatile compared to existing methods in the presence of a number of data idiosyncratic features. ROBOUT is able to identify observations with outlying conditional variance when the dataset contains element-wise sparse variables, and the set of predictors contains multivariate outliers. Existing integrated methodologies like SPARSE-LTS and RLARS are systematically sub-optimal under those conditions. ROBOUT entails a robust selection stage of the statistically relevant predictors (by using a Huber or a quantile loss), the estimation of a robust regression model based on the selected predictors (by LTS, GS or MM), and a criterion to identify conditional outliers based on a robust measure of the residuals' dispersion. We conduct a comprehensive simulation study in which the different variants of the proposed algorithm are tested under an exhaustive set of different perturbation scenarios. The methodology is also applied to a granular supervisory banking dataset collected by the European Central Bank.
READ FULL TEXT