Model selection for estimation of causal parameters

by   Dominik Rothenhäusler, et al.

A popular technique for selecting and tuning machine learning estimators is cross-validation. Cross-validation evaluates overall model fit, usually in terms of predictive accuracy. This may lead to models that exhibit good overall predictive accuracy, but can be suboptimal for estimating causal quantities such as the average treatment effect. We propose a model selection procedure that estimates the mean-squared error of a one-dimensional estimator. The procedure relies on knowing an asymptotically unbiased estimator of the parameter of interest. Under regularity conditions, we show that the proposed criterion has asymptotically equal or lower variance than competing procedures based on sample splitting. In the literature, model selection is often used to choose among models for nuisance parameters but the identification strategy is usually fixed across models. Here, we use model selection to select among estimators that correspond to different estimands. More specifically, we use model selection to shrink between methods such as augmented inverse probability weighting, regression adjustment, the instrumental variables approach, and difference-in-means. The performance of the approach for estimation and inference for average treatment effects is evaluated on simulated data sets, including experimental data, instrumental variables settings, and observational data with selection on observables.


page 1

page 2

page 3

page 4


Cross-Validation, Risk Estimation, and Model Selection

Cross-validation is a popular non-parametric method for evaluating the a...

Empirical Analysis of Model Selection for Heterogenous Causal Effect Estimation

We study the problem of model selection in causal inference, specificall...

An Easy Implementation of CV-TMLE

In the world of targeted learning, cross-validated targeted maximum like...

How to select predictive models for causal inference?

Predictive models – as with machine learning – can underpin causal infer...

Building Robust Machine Learning Models for Small Chemical Science Data: The Case of Shear Viscosity

Shear viscosity, though being a fundamental property of all liquids, is ...

Have we been Naive to Select Machine Learning Models? Noisy Data are here to Stay!

The model selection procedure is usually a single-criterion decision mak...

Model selection for component network meta-analysis in connected and disconnected networks: a simulation study

Network meta-analysis (NMA) is widely used in evidence synthesis to esti...

Please sign up or login with your details

Forgot password? Click here to reset