Imitation Learning with a Value-Based Prior

06/20/2012
by   Umar Syed, et al.
0

The goal of imitation learning is for an apprentice to learn how to behave in a stochastic environment by observing a mentor demonstrating the correct behavior. Accurate prior knowledge about the correct behavior can reduce the need for demonstrations from the mentor. We present a novel approach to encoding prior knowledge about the correct behavior, where we assume that this prior knowledge takes the form of a Markov Decision Process (MDP) that is used by the apprentice as a rough and imperfect model of the mentor's behavior. Specifically, taking a Bayesian approach, we treat the value of a policy in this modeling MDP as the log prior probability of the policy. In other words, we assume a priori that the mentor's behavior is likely to be a high value policy in the modeling MDP, though quite possibly different from the optimal policy. We describe an efficient algorithm that, given a modeling MDP and a set of demonstrations by a mentor, provably converges to a stationary point of the log posterior of the mentor's policy, where the posterior is computed with respect to the "value based" prior. We also present empirical evidence that this prior does in fact speed learning of the mentor's policy, and is an improvement in our experiments over similar previous methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2020

Compressed imitation learning

In analogy to compressed sensing, which allows sample-efficient signal r...
research
06/15/2023

Residual Q-Learning: Offline and Online Policy Customization without Value

Imitation Learning (IL) is a widely used framework for learning imitativ...
research
05/07/2018

Imitation Refinement

Many real-world tasks involve identifying patterns from data satisfying ...
research
09/13/2020

Toward the Fundamental Limits of Imitation Learning

Imitation learning (IL) aims to mimic the behavior of an expert policy i...
research
06/27/2023

Learning non-Markovian Decision-Making from State-only Sequences

Conventional imitation learning assumes access to the actions of demonst...
research
10/28/2021

Learning Feasibility to Imitate Demonstrators with Different Dynamics

The goal of learning from demonstrations is to learn a policy for an age...
research
10/08/2019

Receding Horizon Curiosity

Sample-efficient exploration is crucial not only for discovering rewardi...

Please sign up or login with your details

Forgot password? Click here to reset