Scalable Bayesian Inverse Reinforcement Learning

02/12/2021
by   Alex J. Chan, et al.
10

Bayesian inference over the reward presents an ideal solution to the ill-posed nature of the inverse reinforcement learning problem. Unfortunately current methods generally do not scale well beyond the small tabular setting due to the need for an inner-loop MDP solver, and even non-Bayesian methods that do themselves scale often require extensive interaction with the environment to perform well, being inappropriate for high stakes or costly applications such as healthcare. In this paper we introduce our method, Approximate Variational Reward Imitation Learning (AVRIL), that addresses both of these issues by jointly learning an approximate posterior distribution over the reward that scales to arbitrarily complicated state spaces alongside an appropriate policy in a completely offline manner through a variational approach to said latent reward. Applying our method to real medical data alongside classic control simulations, we demonstrate Bayesian reward inference in environments beyond the scope of current methods, as well as task performance competitive with focused offline imitation learning algorithms.

READ FULL TEXT

page 2

page 9

research
12/10/2019

Deep Bayesian Reward Learning from Preferences

Bayesian inverse reinforcement learning (IRL) methods are ideal for safe...
research
07/24/2020

Bayesian Robust Optimization for Imitation Learning

One of the main challenges in imitation learning is determining what act...
research
10/13/2021

On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning

We consider the problem of using expert data with unobserved confounders...
research
12/30/2022

Bayesian Learning for Dynamic Inference

The traditional statistical inference is static, in the sense that the e...
research
02/21/2020

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

Bayesian reward learning from demonstrations enables rigorous safety and...
research
06/25/2020

Strictly Batch Imitation Learning by Energy-based Distribution Matching

Consider learning a policy purely on the basis of demonstrated behavior—...
research
10/20/2022

Robust Imitation via Mirror Descent Inverse Reinforcement Learning

Recently, adversarial imitation learning has shown a scalable reward acq...

Please sign up or login with your details

Forgot password? Click here to reset