XIRL: Cross-embodiment Inverse Reinforcement Learning

by   Kevin Zakka, et al.

We investigate the visual cross-embodiment imitation setting, in which agents learn policies from videos of other agents (such as humans) demonstrating the same task, but with stark differences in their embodiments – shape, actions, end-effector dynamics, etc. In this work, we demonstrate that it is possible to automatically discover and learn vision-based reward functions from cross-embodiment demonstration videos that are robust to these differences. Specifically, we present a self-supervised method for Cross-embodiment Inverse Reinforcement Learning (XIRL) that leverages temporal cycle-consistency constraints to learn deep visual embeddings that capture task progression from offline videos of demonstrations across multiple expert agents, each performing the same task differently due to embodiment differences. Prior to our work, producing rewards from self-supervised embeddings has typically required alignment with a reference trajectory, which may be difficult to acquire. We show empirically that if the embeddings are aware of task-progress, simply taking the negative distance between the current state and goal state in the learned embedding space is useful as a reward for training policies with reinforcement learning. We find our learned reward function not only works for embodiments seen during training, but also generalizes to entirely new embodiments. We also find that XIRL policies are more sample efficient than baselines, and in some cases exceed the sample efficiency of the same agent trained with ground truth sparse rewards.


page 5

page 7

page 10

page 11


What Can Learned Intrinsic Rewards Capture?

Reinforcement learning agents can include different components, such as ...

There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning

We propose to learn to distinguish reversible from irreversible actions ...

Learning from Suboptimal Demonstration via Self-Supervised Reward Regression

Learning from Demonstration (LfD) seeks to democratize robotics by enabl...

Cut-and-Approximate: 3D Shape Reconstruction from Planar Cross-sections with Deep Reinforcement Learning

Current methods for 3D object reconstruction from a set of planar cross-...

Learning Actionable Representations from Visual Observations

In this work we explore a new approach for robots to teach themselves ab...

Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

Developing agents that can execute multiple skills by learning from pre-...

LPaintB: Learning to Paint from Self-SupervisionLPaintB: Learning to Paint from Self-Supervision

We present a novel reinforcement learning-based natural media painting a...

Please sign up or login with your details

Forgot password? Click here to reset