Deep Jump Q-Evaluation for Offline Policy Evaluation in Continuous Action Space

10/29/2020

∙

We consider off-policy evaluation (OPE) in continuous action domains, such as dynamic pricing and personalized dose finding. In OPE, one aims to learn the value under a new policy using historical data generated by a different behavior policy. Most existing works on OPE focus on discrete action domains. To handle continuous action space, we develop a brand-new deep jump Q-evaluation method for OPE. The key ingredient of our method lies in adaptively discretizing the action space using deep jump Q-learning. This allows us to apply existing OPE methods in discrete domains to handle continuous actions. Our method is further justified by theoretical results, synthetic and real datasets.

READ FULL TEXT

Deep Jump Q-Evaluation for Offline Policy Evaluation in Continuous Action Space

Sign in with Google

Consider DeepAI Pro