Off-Policy Evaluation with Policy-Dependent Optimization Response

by   Wenshuo Guo, et al.

The intersection of causal inference and machine learning for decision-making is rapidly expanding, but the default decision criterion remains an average of individual causal outcomes across a population. In practice, various operational restrictions ensure that a decision-maker's utility is not realized as an average but rather as an output of a downstream decision-making problem (such as matching, assignment, network flow, minimizing predictive risk). In this work, we develop a new framework for off-policy evaluation with a policy-dependent linear optimization response: causal outcomes introduce stochasticity in objective function coefficients. In this framework, a decision-maker's utility depends on the policy-dependent optimization, which introduces a fundamental challenge of optimization bias even for the case of policy evaluation. We construct unbiased estimators for the policy-dependent estimand by a perturbation method. We also discuss the asymptotic variance properties for a set of plug-in regression estimators adjusted to be compatible with that perturbation method. Lastly, attaining unbiased policy evaluation allows for policy optimization, and we provide a general algorithm for optimizing causal interventions. We corroborate our theoretical results with numerical simulations.


page 1

page 2

page 3

page 4


Applying causal inference to inform early-childhood policy from administrative data

Improving public policy is one of the key roles of governments, and they...

Causal Inference in High Dimensions – Without Sparsity

We revisit the classical causal inference problem of estimating the aver...

Note on the Delta Method for Finite Population Inference with Applications to Causal Inference

This work derives a finite population delta method. The delta method cre...

The foundations of cost-sensitive causal classification

Classification is a well-studied machine learning task which concerns th...

Efficient Learning for Clustering and Optimizing Context-Dependent Designs

We consider a simulation optimization problem for a context-dependent de...

Please sign up or login with your details

Forgot password? Click here to reset