Conformal Off-Policy Prediction

06/14/2022
by   Yingying Zhang, et al.
0

Off-policy evaluation is critical in a number of applications where new policies need to be evaluated offline before online deployment. Most existing methods focus on the expected return, define the target parameter through averaging and provide a point estimator only. In this paper, we develop a novel procedure to produce reliable interval estimators for a target policy's return starting from any initial state. Our proposal accounts for the variability of the return around its expectation, focuses on the individual effect and offers valid uncertainty quantification. Our main idea lies in designing a pseudo policy that generates subsamples as if they were sampled from the target policy so that existing conformal prediction algorithms are applicable to prediction interval construction. Our methods are justified by theories, synthetic data and real data from short-video platforms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2022

Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

This paper is concerned with constructing a confidence interval for a ta...
research
04/05/2023

Conformal Off-Policy Evaluation in Markov Decision Processes

Reinforcement Learning aims at identifying and evaluating efficient cont...
research
05/10/2021

Deeply-Debiased Off-Policy Interval Estimation

Off-policy evaluation learns a target policy's value with a historical d...
research
05/31/2023

Offline Meta Reinforcement Learning with In-Distribution Online Adaptation

Recent offline meta-reinforcement learning (meta-RL) methods typically u...
research
06/09/2022

Conformal Off-Policy Prediction in Contextual Bandits

Most off-policy evaluation methods for contextual bandits have focused o...
research
12/29/2022

Quantile Off-Policy Evaluation via Deep Conditional Generative Learning

Off-Policy evaluation (OPE) is concerned with evaluating a new target po...
research
08/31/2021

Evaluating the Robustness of Off-Policy Evaluation

Off-policy Evaluation (OPE), or offline evaluation in general, evaluates...

Please sign up or login with your details

Forgot password? Click here to reset