Masked Autoencoding for Scalable and Generalizable Decision Making

11/23/2022
by   Fangchen Liu, et al.
0

We are interested in learning scalable agents for reinforcement learning that can learn from large-scale, diverse sequential data similar to current large vision and language models. To this end, this paper presents masked decision prediction (MaskDP), a simple and scalable self-supervised pretraining method for reinforcement learning (RL) and behavioral cloning (BC). In our MaskDP approach, we employ a masked autoencoder (MAE) to state-action trajectories, wherein we randomly mask state and action tokens and reconstruct the missing data. By doing so, the model is required to infer masked-out states and actions and extract information about dynamics. We find that masking different proportions of the input sequence significantly helps with learning a better model that generalizes well to multiple downstream tasks. In our empirical study, we find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching, and it can zero-shot infer skills from a few example transitions. In addition, MaskDP transfers well to offline RL and shows promising scaling behavior w.r.t. to model size. It is amenable to data-efficient finetuning, achieving competitive results with prior methods based on autoregressive pretraining.

READ FULL TEXT

page 6

page 7

page 15

page 16

research
06/26/2023

Supervised Pretraining Can Learn In-Context Reinforcement Learning

Large transformer models trained on diverse datasets have shown a remark...
research
11/29/2021

Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions

Reinforcement learning (RL) agents are widely used for solving complex s...
research
02/11/2021

Representation Matters: Offline Pretraining for Sequential Decision Making

The recent success of supervised learning methods on ever larger offline...
research
04/28/2022

Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

Randomly masking and predicting word tokens has been a successful approa...
research
11/23/2022

Multi-Environment Pretraining Enables Transfer to Action Limited Datasets

Using massive datasets to train large-scale models has emerged as a domi...
research
11/20/2022

UniMASK: Unified Inference in Sequential Decision Problems

Randomly masking and predicting word tokens has been a successful approa...
research
10/07/2020

Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning

Humans show an innate ability to learn the regularities of the world thr...

Please sign up or login with your details

Forgot password? Click here to reset