Deep Jump Q-Evaluation for Offline Policy Evaluation in Continuous Action Space

10/29/2020
by   Hengrui Cai, et al.
0

We consider off-policy evaluation (OPE) in continuous action domains, such as dynamic pricing and personalized dose finding. In OPE, one aims to learn the value under a new policy using historical data generated by a different behavior policy. Most existing works on OPE focus on discrete action domains. To handle continuous action space, we develop a brand-new deep jump Q-evaluation method for OPE. The key ingredient of our method lies in adaptively discretizing the action space using deep jump Q-learning. This allows us to apply existing OPE methods in discrete domains to handle continuous actions. Our method is further justified by theoretical results, synthetic and real datasets.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset