Blessing from Experts: Super Reinforcement Learning in Confounded Environments

09/29/2022
by   Jiayi Wang, et al.
10

We introduce super reinforcement learning in the batch setting, which takes the observed action as input for enhanced policy learning. In the presence of unmeasured confounders, the recommendations from human experts recorded in the observed data allow us to recover certain unobserved information. Including this information in the policy search, the proposed super reinforcement learning will yield a super-policy that is guaranteed to outperform both the standard optimal policy and the behavior one (e.g., the expert's recommendation). Furthermore, to address the issue of unmeasured confounding in finding super-policies, a number of non-parametric identification results are established. Finally, we develop two super-policy learning algorithms and derive their corresponding finite-sample regret guarantees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/07/2018

Policy Certificates: Towards Accountable Reinforcement Learning

The performance of a reinforcement learning algorithm can vary drastical...
research
05/05/2013

Regret Bounds for Reinforcement Learning with Policy Advice

In some reinforcement learning problems an agent may be provided with a ...
research
10/28/2021

Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes

In applications of offline reinforcement learning to observational data,...
research
12/28/2022

On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

KL-regularized reinforcement learning from expert demonstrations has pro...
research
06/12/2022

Reinforcement Learning for Vision-based Object Manipulation with Non-parametric Policy and Action Primitives

The object manipulation is a crucial ability for a service robot, but it...
research
05/13/2022

Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets

Upside-Down Reinforcement Learning (UDRL) is an approach for solving RL ...
research
03/01/2018

Inverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal Modeling

Recent advances in the field of inverse reinforcement learning (IRL) hav...

Please sign up or login with your details

Forgot password? Click here to reset