Importance Sampling Policy Evaluation with an Estimated Behavior Policy

06/04/2018
by   Josiah Hanna, et al.
0

In reinforcement learning, off-policy evaluation is the task of using data generated by one policy to determine the expected return of a second policy. Importance sampling is a standard technique for off-policy evaluation, allowing off-policy data to be used as if it were on-policy. When the policy that generated the off-policy data is unknown, the ordinary importance sampling estimator cannot be applied. In this paper, we study a family of regression importance sampling (RIS) methods that apply importance sampling by first estimating the behavior policy. We find that these estimators give strong empirical performance---surprisingly often outperforming importance sampling with the true behavior policy in both discrete and continuous domains. Our results emphasize the importance of estimating the behavior policy using only the data that will also be used for the importance sampling estimate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/10/2016

Importance Sampling with Unequal Support

Importance sampling is often used in machine learning when training and ...
research
09/13/2021

State Relevance for Off-Policy Evaluation

Importance sampling-based estimators for off-policy evaluation (OPE) are...
research
04/03/2017

A comparative study of counterfactual estimators

We provide a comparative study of several widely used off-policy estimat...
research
06/12/2021

A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation

Marginalized importance sampling (MIS), which measures the density ratio...
research
11/22/2021

Case-based off-policy policy evaluation using prototype learning

Importance sampling (IS) is often used to perform off-policy policy eval...
research
07/03/2018

Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters

In this work, we consider the problem of estimating a behaviour policy f...
research
10/02/2017

Oracle Importance Sampling for Stochastic Simulation Models

We consider the problem of estimating an expected outcome from a stochas...

Please sign up or login with your details

Forgot password? Click here to reset