Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

06/23/2020
by   Aaron Sonabend W, et al.
0

Offline Reinforcement Learning (RL) is a promising approach for learning optimal policies in environments where direct exploration is expensive or unfeasible. However, the adoption of such policies in practice is often challenging, as they are hard to interpret within the application context, and lack measures of uncertainty for the learned policy value and its decisions. To overcome these issues, we propose an Expert-Supervised RL (ESRL) framework which uses uncertainty quantification for offline policy learning. In particular, we have three contributions: 1) the method can learn safe and optimal policies through hypothesis testing, 2) ESRL allows for different levels of risk aversion within the application context, and finally, 3) we propose a way to interpret ESRL's policy at every state through posterior distributions, and use this framework to compute off-policy value function posteriors. We provide theoretical guarantees for our estimators and regret bounds consistent with Posterior Sampling for RL (PSRL) that account for any risk aversion threshold. We further propose an offline version of PSRL as a special case of ESRL.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2022

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Offline Reinforcement Learning (RL) aims to learn policies from previous...
research
03/24/2022

Bellman Residual Orthogonalization for Offline Reinforcement Learning

We introduce a new reinforcement learning principle that approximates th...
research
11/30/2020

IV-Posterior: Inverse Value Estimation for Interpretable Policy Certificates

Model-free reinforcement learning (RL) is a powerful tool to learn a bro...
research
10/19/2022

Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation

A promising paradigm for offline reinforcement learning (RL) is to const...
research
03/13/2018

Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies

In this work, we provide theoretical guarantees for reward decomposition...
research
02/01/2023

Selective Uncertainty Propagation in Offline RL

We study the finite-horizon offline reinforcement learning (RL) problem....
research
02/07/2020

Provably efficient reconstruction of policy networks

Recent research has shown that learning poli-cies parametrized by large ...

Please sign up or login with your details

Forgot password? Click here to reset