Emergent Agentic Transformer from Chain of Hindsight Experience

05/26/2023
by   Hao Liu, et al.
0

Large transformer models powered by diverse data and model scale have dominated natural language modeling and computer vision and pushed the frontier of multiple AI areas. In reinforcement learning (RL), despite many efforts into transformer-based policies, a key limitation, however, is that current transformer-based policies cannot learn by directly combining information from multiple sub-optimal trials. In this work, we address this issue using recently proposed chain of hindsight to relabel experience, where we train a transformer on a sequence of trajectory experience ascending sorted according to their total rewards. Our method consists of relabelling target return of each trajectory to the maximum total reward among in sequence of trajectories and training an autoregressive model to predict actions conditioning on past states, actions, rewards, target returns, and task completion tokens, the resulting model, Agentic Transformer (AT), can learn to improve upon itself both at training and test time. As we show on D4RL and ExoRL benchmarks, to the best our knowledge, this is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches, even from sub-optimal data. Our Agentic Transformer also shows a promising scaling trend that bigger models consistently improve results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2021

Decision Transformer: Reinforcement Learning via Sequence Modeling

We present a framework that abstracts Reinforcement Learning (RL) as a s...
research
08/31/2023

Multi-Objective Decision Transformers for Offline Reinforcement Learning

Offline Reinforcement Learning (RL) is structured to derive policies fro...
research
05/26/2023

Future-conditioned Unsupervised Pretraining for Decision Transformer

Recent research in offline reinforcement learning (RL) has demonstrated ...
research
03/13/2023

Transformer-based World Models Are Happy With 100k Interactions

Deep neural networks have been successful in many reinforcement learning...
research
08/20/2023

Karma: Adaptive Video Streaming via Causal Sequence Modeling

Optimal adaptive bitrate (ABR) decision depends on a comprehensive chara...
research
11/26/2022

How Crucial is Transformer in Decision Transformer?

Decision Transformer (DT) is a recently proposed architecture for Reinfo...
research
04/05/2023

ENTL: Embodied Navigation Trajectory Learner

We propose Embodied Navigation Trajectory Learner (ENTL), a method for e...

Please sign up or login with your details

Forgot password? Click here to reset