Video2Skill: Adapting Events in Demonstration Videos to Skills in an Environment using Cyclic MDP Homomorphisms

by   Sumedh A. Sontakke, et al.
University of Southern California

Humans excel at learning long-horizon tasks from demonstrations augmented with textual commentary, as evidenced by the burgeoning popularity of tutorial videos online. Intuitively, this capability can be separated into 2 distinct subtasks - first, dividing a long-horizon demonstration sequence into semantically meaningful events; second, adapting such events into meaningful behaviors in one's own environment. Here, we present Video2Skill (V2S), which attempts to extend this capability to artificial agents by allowing a robot arm to learn from human cooking videos. We first use sequence-to-sequence Auto-Encoder style architectures to learn a temporal latent space for events in long-horizon demonstrations. We then transfer these representations to the robotic target domain, using a small amount of offline and unrelated interaction data (sequences of state-action pairs of the robot arm controlled by an expert) to adapt these events into actionable representations, i.e., skills. Through experiments, we demonstrate that our approach results in self-supervised analogy learning, where the agent learns to draw analogies between motions in human demonstration data and behaviors in the robotic environment. We also demonstrate the efficacy of our approach on model learning - demonstrating how Video2Skill utilizes prior knowledge from human demonstration to outperform traditional model learning of long-horizon dynamics. Finally, we demonstrate the utility of our approach for non-tabula rasa decision-making, i.e, utilizing video demonstration for zero-shot skill generation.


Demonstration-Guided Reinforcement Learning with Learned Skills

Demonstration-guided reinforcement learning (RL) is a promising approach...

Modeling Long-horizon Tasks as Sequential Interaction Landscapes

Complex object manipulation tasks often span over long sequences of oper...

Learning Cooperative Dynamic Manipulation Skills from Human Demonstration Videos

This article proposes a method for learning and robotic replication of d...

Active Learning based on Data Uncertainty and Model Sensitivity

Robots can rapidly acquire new skills from demonstrations. However, duri...

Behavioral Cloning via Search in Embedded Demonstration Dataset

Behavioural cloning uses a dataset of demonstrations to learn a behaviou...

Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos

To realize human-robot collaboration, robots need to execute actions for...

A Robot that Learns Connect Four Using Game Theory and Demonstrations

Teaching robots new skills using minimal time and effort has long been a...

Please sign up or login with your details

Forgot password? Click here to reset