Conformal Off-Policy Prediction in Contextual Bandits

06/09/2022
by   Muhammad Faaiz Taufiq, et al.
0

Most off-policy evaluation methods for contextual bandits have focused on the expected outcome of a policy, which is estimated via methods that at best provide only asymptotic guarantees. However, in many applications, the expectation may not be the best measure of performance as it does not capture the variability of the outcome. In addition, particularly in safety-critical settings, stronger guarantees than asymptotic correctness may be required. To address these limitations, we consider a novel application of conformal prediction to contextual bandits. Given data collected under a behavioral policy, we propose conformal off-policy prediction (COPP), which can output reliable predictive intervals for the outcome under a new target policy. We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup, and empirically demonstrate the utility of COPP compared with existing methods on synthetic and real-world data.

READ FULL TEXT

page 20

page 29

research
06/07/2019

Empirical Likelihood for Contextual Bandits

We apply empirical likelihood techniques to contextual bandit policy val...
research
05/06/2021

Contextual Bandits with Sparse Data in Web setting

This paper is a scoping study to identify current methods used in handli...
research
12/15/2018

Balanced Linear Contextual Bandits

Contextual bandit algorithms are sensitive to the estimation method of t...
research
06/16/2020

Off-policy Bandits with Deficient Support

Learning effective contextual-bandit policies from past actions of a dep...
research
02/08/2015

Learning to Search Better Than Your Teacher

Methods for learning to search for structured prediction typically imita...
research
06/14/2022

Conformal Off-Policy Prediction

Off-policy evaluation is critical in a number of applications where new ...
research
07/13/2021

Inverse Contextual Bandits: Learning How Behavior Evolves over Time

Understanding an agent's priorities by observing their behavior is criti...

Please sign up or login with your details

Forgot password? Click here to reset