OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

09/09/2021
by   Hana Hoshino, et al.
29

Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward engineering can be tedious. However, prior IRL algorithms use on-policy transitions, which require intensive sampling from the current policy for stable and optimal performance. This limits IRL applications in the real world, where environment interactions can become highly expensive. To tackle this problem, we present Off-Policy Inverse Reinforcement Learning (OPIRL), which (1) adopts off-policy data distribution instead of on-policy and enables significant reduction of the number of interactions with the environment, (2) learns a stationary reward function that is transferable with high generalization capabilities on changing dynamics, and (3) leverages mode-covering behavior for faster convergence. We demonstrate that our method is considerably more sample efficient and generalizes to novel environments through the experiments. Our method achieves better or comparable results on policy performance baselines with significantly fewer interactions. Furthermore, we empirically show that the recovered reward function generalizes to different tasks where prior arts are prone to fail.

READ FULL TEXT

page 1

page 5

page 10

page 11

page 12

09/09/2018

Addressing Sample Inefficiency and Reward Bias in Inverse Reinforcement Learning

The Generative Adversarial Imitation Learning (GAIL) framework from Ho &...
02/01/2023

Internally Rewarded Reinforcement Learning

We study a class of reinforcement learning problems where the reward sig...
06/24/2020

Quantifying Differences in Reward Functions

For many tasks, the reward function is too complex to be specified proce...
07/18/2022

Active Exploration for Inverse Reinforcement Learning

Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferrin...
09/25/2022

Reward Learning using Structural Motifs in Inverse Reinforcement Learning

The Inverse Reinforcement Learning (IRL) problem has seen rapid evolutio...
09/15/2023

A Bayesian Approach to Robust Inverse Reinforcement Learning

We consider a Bayesian approach to offline model-based inverse reinforce...
09/27/2017

A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

Reward engineering is an important aspect of reinforcement learning. Whe...

Code Repositories

Please sign up or login with your details

Forgot password? Click here to reset