Evaluating the Robustness of Targeted Maximum Likelihood Estimators via Realistic Simulations in Nutrition Intervention Trials

by   Haodong Li, et al.

Several recently developed methods have the potential to harness machine learning in the pursuit of target quantities inspired by causal inference, including inverse weighting, doubly robust estimating equations and substitution estimators like targeted maximum likelihood estimation. There are even more recent augmentations of these procedures that can increase robustness, by adding a layer of cross-validation (cross-validated targeted maximum likelihood estimation and double machine learning, as applied to substitution and estimating equation approaches, respectively). While these methods have been evaluated individually on simulated and experimental data sets, a comprehensive analysis of their performance across “real-world” simulations have yet to be conducted. In this work, we benchmark multiple widely used methods for estimation of the average treatment effect using ten different nutrition intervention studies data. A realistic set of simulations, based on a novel method, highly adaptive lasso, for estimating the data-generating distribution that guarantees a certain level of complexity (undersmoothing) is used to better mimic the complexity of the true data-generating distribution. We have applied this novel method for estimating the data-generating distribution by individual study and to subsequently use these fits to simulate data and estimate treatment effects parameters as well as their standard errors and resulting confidence intervals. Based on the analytic results, a general recommendation is put forth for use of the cross-validated variants of both substitution and estimating equation estimators. We conclude that the additional layer of cross-validation helps in avoiding unintentional over-fitting of nuisance parameter functionals and leads to more robust inferences.


page 1

page 25

page 26

page 27


Machine learning for causal inference: on the use of cross-fit estimators

Modern causal inference methods allow machine learning to be used to wea...

When Doubly Robust Methods Meet Machine Learning for Estimating Treatment Effects from Real-World Data: A Comparative Study

Observational cohort studies are increasingly being used for comparative...

Comparative Methods for the Analysis of Cluster Randomized Trials

Across research disciplines, cluster randomized trials (CRTs) are common...

An Easy Implementation of CV-TMLE

In the world of targeted learning, cross-validated targeted maximum like...

Efficient and Robust Approaches for Analysis of SMARTs: Illustration using the ADAPT-R Trial

Personalized intervention strategies, in particular those that modify tr...

Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review

The Targeted Maximum Likelihood Estimation (TMLE) statistical data analy...

Please sign up or login with your details

Forgot password? Click here to reset