Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models

11/21/2022
by   Ted Xiao, et al.
0

In recent years, much progress has been made in learning robotic manipulation policies that follow natural language instructions. Such methods typically learn from corpora of robot-language data that was either collected with specific tasks in mind or expensively re-labelled by humans with rich language descriptions in hindsight. Recently, large-scale pretrained vision-language models (VLMs) like CLIP or ViLD have been applied to robotics for learning representations and scene descriptors. Can these pretrained models serve as automatic labelers for robot data, effectively importing Internet-scale knowledge into existing datasets to make them useful even for tasks that are not reflected in their ground truth annotations? To accomplish this, we introduce Data-driven Instruction Augmentation for Language-conditioned control (DIAL): we utilize semi-supervised language labels leveraging the semantic understanding of CLIP to propagate knowledge onto large datasets of unlabelled demonstration data and then train language-conditioned policies on the augmented datasets. This method enables cheaper acquisition of useful language descriptions compared to expensive human labels, allowing for more efficient label coverage of large-scale datasets. We apply DIAL to a challenging real-world robotic manipulation domain where 96.5 do not contain crowd-sourced language annotations. DIAL enables imitation learning policies to acquire new capabilities and generalize to 60 novel instructions unseen in the original dataset.

READ FULL TEXT

page 3

page 5

page 7

page 14

page 16

page 17

page 18

page 19

research
06/30/2023

Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control

Our goal is for robots to follow natural language instructions like "put...
research
09/02/2023

Developmental Scaffolding with Large Language Models

Exploratoration and self-observation are key mechanisms of infant sensor...
research
09/14/2023

GRID: Scene-Graph-based Instruction-driven Robotic Task Planning

Recent works have shown that Large Language Models (LLMs) can promote gr...
research
06/19/2023

LARG, Language-based Automatic Reward and Goal Generation

Goal-conditioned and Multi-Task Reinforcement Learning (GCRL and MTRL) a...
research
06/09/2023

Embodied Executable Policy Learning with Language-based Scene Summarization

Large Language models (LLMs) have shown remarkable success in assisting ...
research
09/11/2022

Instruction-driven history-aware policies for robotic manipulations

In human environments, robots are expected to accomplish a variety of ma...
research
06/14/2023

Language to Rewards for Robotic Skill Synthesis

Large language models (LLMs) have demonstrated exciting progress in acqu...

Please sign up or login with your details

Forgot password? Click here to reset