Sequence Model Imitation Learning with Unobserved Contexts

08/03/2022
by   Gokul Swamy, et al.
0

We consider imitation learning problems where the expert has access to a per-episode context that is hidden from the learner, both in the demonstrations and at test-time. While the learner might not be able to accurately reproduce expert behavior early on in an episode, by considering the entire history of states and actions, they might be able to eventually identify the context and act as the expert would. We prove that on-policy imitation learning algorithms (with or without access to a queryable expert) are better equipped to handle these sorts of asymptotically realizable problems than off-policy methods and are able to avoid the latching behavior (naive repetition of past actions) that plagues the latter. We conduct experiments in a toy bandit domain that show that there exist sharp phase transitions of whether off-policy approaches are able to match expert performance asymptotically, in contrast to the uniformly good performance of on-policy approaches. We demonstrate that on several continuous control tasks, on-policy approaches are able to use history to identify the context while off-policy approaches actually perform worse when given access to history.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2022

Deconfounded Imitation Learning

Standard imitation learning can fail when the expert demonstrators have ...
research
08/12/2022

Causal Imitation Learning with Unobserved Confounders

One of the common ways children learn is by mimicking adults. Imitation ...
research
03/04/2021

Of Moments and Matching: Trade-offs and Treatments in Imitation Learning

We provide a unifying view of a large family of previous imitation learn...
research
02/02/2022

Causal Imitation Learning under Temporally Correlated Noise

We develop algorithms for imitation learning from policy data that was c...
research
04/25/2022

Imitation Learning from Observations under Transition Model Disparity

Learning to perform tasks by leveraging a dataset of expert observations...
research
06/07/2021

Learning without Knowing: Unobserved Context in Continuous Transfer Reinforcement Learning

In this paper, we consider a transfer Reinforcement Learning (RL) proble...
research
02/22/2021

Optimism is All You Need: Model-Based Imitation Learning From Observation Alone

This paper studies Imitation Learning from Observations alone (ILFO) whe...

Please sign up or login with your details

Forgot password? Click here to reset