Chain of Thought Imitation with Procedure Cloning

05/22/2022
by   Mengjiao Yang, et al.
0

Imitation learning aims to extract high-performance policies from logged demonstrations of expert behavior. It is common to frame imitation learning as a supervised learning problem in which one fits a function approximator to the input-output mapping exhibited by the logged demonstrations (input observations to output actions). While the framing of imitation learning as a supervised input-output learning problem allows for applicability in a wide variety of settings, it is also an overly simplistic view of the problem in situations where the expert demonstrations provide much richer insight into expert behavior. For example, applications such as path navigation, robot manipulation, and strategy games acquire expert demonstrations via planning, search, or some other multi-step algorithm, revealing not just the output action to be imitated but also the procedure for how to determine this action. While these intermediate computations may use tools not available to the agent during inference (e.g., environment simulators), they are nevertheless informative as a way to explain an expert's mapping of state to actions. To properly leverage expert procedure information without relying on the privileged tools the expert may have used to perform the procedure, we propose procedure cloning, which applies supervised sequence prediction to imitate the series of expert computations. This way, procedure cloning learns not only what to do (i.e., the output action), but how and why to do it (i.e., the procedure). Through empirical analysis on navigation, simulated robotic manipulation, and game-playing environments, we show that imitating the intermediate computations of an expert's behavior enables procedure cloning to learn policies exhibiting significant generalization to unseen environment configurations, including those configurations for which running the expert's procedure directly is infeasible.

READ FULL TEXT
research
04/01/2019

Generative predecessor models for sample-efficient imitation learning

We propose Generative Predecessor Models for Imitation Learning (GPRIL),...
research
09/27/2019

Zero-shot Imitation Learning from Demonstrations for Legged Robot Visual Navigation

Imitation learning is a popular approach for training effective visual n...
research
05/21/2018

Imitating Latent Policies from Observation

We describe a novel approach to imitation learning that infers latent po...
research
02/26/2023

Diffusion Model-Augmented Behavioral Cloning

Imitation learning addresses the challenge of learning by observing an e...
research
10/17/2022

Inferring Versatile Behavior from Demonstrations by Matching Geometric Descriptors

Humans intuitively solve tasks in versatile ways, varying their behavior...
research
06/28/2021

Expert Q-learning: Deep Q-learning With State Values From Expert Examples

We propose a novel algorithm named Expert Q-learning. Expert Q-learning ...
research
12/26/2020

Stochastic Action Prediction for Imitation Learning

Imitation learning is a data-driven approach to acquiring skills that re...

Please sign up or login with your details

Forgot password? Click here to reset