Powerful Knockoffs via Minimizing Reconstructability

11/30/2020
by   Asher Spector, et al.
0

Model-X knockoffs allows analysts to perform feature selection using almost any machine learning algorithm while still provably controlling the expected proportion of false discoveries. To apply model-X knockoffs, one must construct synthetic variables, called knockoffs, which effectively act as controls during feature selection. The gold standard for constructing knockoffs has been to minimize the mean absolute correlation (MAC) between features and their knockoffs, but, surprisingly, we prove this procedure can be powerless in extremely easy settings, including Gaussian linear models with correlated exchangeable features. The key problem is that minimizing the MAC creates strong joint dependencies between the features and knockoffs, which allow machine learning algorithms to partially or fully reconstruct the effect of the features on the response using the knockoffs. To improve the power of knockoffs, we propose generating knockoffs which minimize the reconstructability (MRC) of the features, and we demonstrate our proposal for Gaussian features by showing it is computationally efficient, robust, and powerful. We also prove that certain MRC knockoffs minimize a natural definition of estimation error in Gaussian linear models. Furthermore, in an extensive set of simulations, we find many settings with correlated features in which MRC knockoffs dramatically outperform MAC-minimizing knockoffs and no settings in which MAC-minimizing knockoffs outperform MRC knockoffs by more than a very slight margin. We implement our methods and a host of others from the knockoffs literature in a new open source python package knockpy.

READ FULL TEXT

page 17

page 18

research
08/05/2022

Feature Selection for Machine Learning Algorithms that Bounds False Positive Rate

The problem of selecting a handful of truly relevant variables in superv...
research
05/05/2020

Feature Selection Methods for Uplift Modeling

Uplift modeling is a predictive modeling technique that estimates the us...
research
07/02/2020

A Scale-free Approach for False Discovery Rate Control in Generalized Linear Models

The generalized linear models (GLM) have been widely used in practice to...
research
06/03/2021

Normalizing Flows for Knockoff-free Controlled Feature Selection

The goal of controlled feature selection is to discover the features a r...
research
02/25/2019

Epileptic seizure classification using statistical sampling and a novel feature selection algorithm

Epilepsy is a well-known neuronal disorder that can be identified by int...
research
04/15/2021

Piecewise-linear modelling with feature selection for Li-ion battery end of life prognosis

The complex nature of lithium-ion battery degradation has led to many ma...
research
09/26/2017

Learning a Predictive Model for Music Using PULSE

Predictive models for music are studied by researchers of algorithmic co...

Please sign up or login with your details

Forgot password? Click here to reset