OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

by   Hana Hoshino, et al.

Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward engineering can be tedious. However, prior IRL algorithms use on-policy transitions, which require intensive sampling from the current policy for stable and optimal performance. This limits IRL applications in the real world, where environment interactions can become highly expensive. To tackle this problem, we present Off-Policy Inverse Reinforcement Learning (OPIRL), which (1) adopts off-policy data distribution instead of on-policy and enables significant reduction of the number of interactions with the environment, (2) learns a stationary reward function that is transferable with high generalization capabilities on changing dynamics, and (3) leverages mode-covering behavior for faster convergence. We demonstrate that our method is considerably more sample efficient and generalizes to novel environments through the experiments. Our method achieves better or comparable results on policy performance baselines with significantly fewer interactions. Furthermore, we empirically show that the recovered reward function generalizes to different tasks where prior arts are prone to fail.


page 1

page 5

page 10

page 11

page 12


Addressing Sample Inefficiency and Reward Bias in Inverse Reinforcement Learning

The Generative Adversarial Imitation Learning (GAIL) framework from Ho &...

Internally Rewarded Reinforcement Learning

We study a class of reinforcement learning problems where the reward sig...

Quantifying Differences in Reward Functions

For many tasks, the reward function is too complex to be specified proce...

Active Exploration for Inverse Reinforcement Learning

Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferrin...

Reward Learning using Structural Motifs in Inverse Reinforcement Learning

The Inverse Reinforcement Learning (IRL) problem has seen rapid evolutio...

A Bayesian Approach to Robust Inverse Reinforcement Learning

We consider a Bayesian approach to offline model-based inverse reinforce...

A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

Reward engineering is an important aspect of reinforcement learning. Whe...

Code Repositories

Please sign up or login with your details

Forgot password? Click here to reset