Hybrid Adversarial Inverse Reinforcement Learning

02/04/2021
by   Mingqi Yuan, et al.
1

In this paper, we investigate the problem of the inverse reinforcement learning (IRL), especially the beyond-demonstrator (BD) IRL. The BD-IRL aims to not only imitate the expert policy but also extrapolate BD policy based on finite demonstrations of the expert. Currently, most of the BD-IRL algorithms are two-stage, which first infer a reward function then learn the policy via reinforcement learning (RL). Because of the two separate procedures, the two-stage algorithms have high computation complexity and lack robustness. To overcome these flaw, we propose a BD-IRL framework entitled hybrid adversarial inverse reinforcement learning (HAIRL), which successfully integrates the imitation and exploration into one procedure. The simulation results show that the HAIRL is more efficient and robust when compared with other similar state-of-the-art (SOTA) algorithms.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset