Active Exploration for Inverse Reinforcement Learning

by   David Lindner, et al.

Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferring a reward function from expert demonstrations. Many IRL algorithms require a known transition model and sometimes even a known expert policy, or they at least require access to a generative model. However, these assumptions are too strong for many real-world applications, where the environment can be accessed only through sequential interaction. We propose a novel IRL algorithm: Active exploration for Inverse Reinforcement Learning (AceIRL), which actively explores an unknown environment and expert policy to quickly learn the expert's reward function and identify a good policy. AceIRL uses previous observations to construct confidence intervals that capture plausible reward functions and find exploration policies that focus on the most informative regions of the environment. AceIRL is the first approach to active IRL with sample-complexity bounds that does not require a generative model of the environment. AceIRL matches the sample complexity of active IRL with a generative model in the worst case. Additionally, we establish a problem-dependent bound that relates the sample complexity of AceIRL to the suboptimality gap of a given IRL problem. We empirically evaluate AceIRL in simulations and find that it significantly outperforms more naive exploration strategies.


page 1

page 2

page 3

page 4


Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

In the field of reinforcement learning there has been recent progress to...

Towards Theoretical Understanding of Inverse Reinforcement Learning

Inverse reinforcement learning (IRL) denotes a powerful family of algori...

Multi-class Generalized Binary Search for Active Inverse Reinforcement Learning

This paper addresses the problem of learning a task from demonstration. ...

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods

In this paper we propose a novel gradient algorithm to learn a policy fr...

OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

Inverse Reinforcement Learning (IRL) is attractive in scenarios where re...

A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation

The current paper studies sample-efficient Reinforcement Learning (RL) i...

Sequential Transfer in Reinforcement Learning with a Generative Model

We are interested in how to design reinforcement learning agents that pr...

Please sign up or login with your details

Forgot password? Click here to reset