Stabilizing Variable Selection and Regression

11/05/2019
by   Niklas Pfister, et al.
0

We consider regression in which one predicts a response Y with a set of predictors X across different experiments or environments. This is a common setup in many data-driven scientific fields and we argue that statistical inference can benefit from an analysis that takes into account the distributional changes across environments. In particular, it is useful to distinguish between stable and unstable predictors, i.e., predictors which have a fixed or a changing functional dependence on the response, respectively. We introduce stabilized regression which explicitly enforces stability and thus improves generalization performance to previously unseen environments. Our work is motivated by an application in systems biology. Using multiomic data, we demonstrate how hypothesis generation about gene function can benefit from stabilized regression. We believe that a similar line of arguments for exploiting heterogeneity in data can be powerful for many other applications as well. We draw a theoretical connection between multi-environment regression and causal models, which allows to graphically characterize stable versus unstable functional dependence on the response. Formally, we introduce the notion of a stable blanket which is a subset of the predictors that lies between the direct causal predictors and the Markov blanket. We prove that this set is optimal in the sense that a regression based on these predictors minimizes the mean squared prediction error given that the resulting regression generalizes to unseen new environments.

READ FULL TEXT

page 4

page 22

page 42

research
05/18/2022

An Invariant Matching Property for Distribution Generalization under Intervened Response

The task of distribution generalization concerns making reliable predict...
research
12/26/2022

Bayesian indicator variable selection of multivariate response with heterogeneous sparsity for multi-trait fine mapping

Variable selection has been played a critical role in contemporary stati...
research
02/09/2020

On Function-on-Scalar Quantile Regression

Existing work on functional response regression has focused predominantl...
research
09/21/2020

Selection of Regression Models under Linear Restrictions for Fixed and Random Designs

Many important modeling tasks in linear regression, including variable s...
research
08/10/2022

Neural Networks for Scalar Input and Functional Output

The regression of a functional response on a set of scalar predictors ca...
research
08/16/2018

Switching Regression Models and Causal Inference in the Presence of Latent Variables

Given a response Y and a vector X = (X^1, ..., X^d) of d predictors, we ...

Please sign up or login with your details

Forgot password? Click here to reset