A Lasso-OLS Hybrid Approach to Covariate Selection and Average Treatment Effect Estimation for Clustered RCTs Using Design-Based Methods

05/05/2020
by   Peter Z. Schochet, et al.
0

Statistical power is often a concern for clustered RCTs due to variance inflation from design effects and the high cost of adding study clusters (such as hospitals, schools, or communities). While covariate pre-specification is the preferred approach for improving power to estimate regression-adjusted average treatment effects (ATEs), further precision gains can be achieved through covariate selection once primary outcomes have been collected. This article uses design-based methods underlying clustered RCTs to develop a Lasso-OLS hybrid procedure for the post-hoc selection of covariates and ATE estimation that avoids model overfitting and lack of transparency. In the first stage, lasso estimation is conducted using cluster-level averages, where asymptotic normality is proved using a new central limit theorem for finite population regression estimators. In the second stage, ATEs and design-based standard errors are estimated using weighted least squares with the first stage lasso covariates. This nonparametric approach applies to continuous, binary, and discrete outcomes. Simulation results indicate that Type 1 errors of the second stage ATE estimates are near nominal values and standard errors are near true ones, although somewhat conservative with small samples. The method is demonstrated using data from a large, federally funded clustered RCT testing the effects of school-based programs promoting behavioral health.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset