Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency

by   Mingfei Sun, et al.
University of Oxford

Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-policy extensions could either change the original objective or involve complicated optimization. We revisit the foundation of adversarial imitation and propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization. Our formulation capitalizes on two key insights: (1) the similarity between the Bellman equation and the stationary state-action distribution equation allows us to derive a novel temporal difference (TD) learning approach; and (2) the use of a deterministic policy simplifies the TD learning. Combined, these insights yield a practical algorithm, Deterministic and Discriminative Imitation (D2-Imitation), which operates by first partitioning samples into two replay buffers and then learning a deterministic policy via off-policy reinforcement learning. Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation on many control tasks.


SoftDICE for Imitation Learning: Rethinking Off-policy Distribution Matching

We present SoftDICE, which achieves state-of-the-art performance for imi...

Off-Policy Imitation Learning from Observations

Learning from Observations (LfO) is a practical reinforcement learning s...

Learning chordal extensions

A highly influential ingredient of many techniques designed to exploit s...

Imitation Learning via Off-Policy Distribution Matching

When performing imitation learning from expert demonstrations, distribut...

Sample-efficient Adversarial Imitation Learning from Observation

Imitation from observation is the framework of learning tasks by observi...

Sample-Efficient Imitation Learning via Generative Adversarial Nets

Recent work in imitation learning articulate their formulation around th...

Provable Representation Learning for Imitation with Contrastive Fourier Features

In imitation learning, it is common to learn a behavior policy to match ...

Please sign up or login with your details

Forgot password? Click here to reset