Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

06/09/2019
by   Nathan Kallus, et al.
0

Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. The problem's importance has attracted many proposed solutions, including importance sampling (IS), self-normalized IS (SNIS), and doubly robust (DR) estimates. DR and its variants ensure semiparametric local efficiency if Q-functions are well-specified, but if they are not they can be worse than both IS and SNIS. It also does not enjoy SNIS's inherent stability and boundedness. We propose new estimators for OPE based on empirical likelihood that are always more efficient than IS, SNIS, and DR and satisfy the same stability and boundedness properties as SNIS. On the way, we categorize various properties and classify existing estimators by them. Besides the theoretical guarantees, empirical studies suggest the new estimators provide advantages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2018

More Robust Doubly Robust Off-policy Evaluation

We study the problem of off-policy evaluation (OPE) in reinforcement lea...
research
06/03/2021

Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits

It has become increasingly common for data to be collected adaptively, f...
research
08/22/2019

Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes

Off-policy evaluation (OPE) in reinforcement learning allows one to eval...
research
04/03/2017

A comparative study of counterfactual estimators

We provide a comparative study of several widely used off-policy estimat...
research
10/20/2019

From Importance Sampling to Doubly Robust Policy Gradient

We show that policy gradient (PG) and its variance reduction variants ca...
research
09/21/2022

Off-Policy Risk Assessment in Markov Decision Processes

Addressing such diverse ends as safety alignment with human preferences,...
research
06/26/2023

DR-HAI: Argumentation-based Dialectical Reconciliation in Human-AI Interactions

We present DR-HAI – a novel argumentation-based framework designed to ex...

Please sign up or login with your details

Forgot password? Click here to reset