Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes

11/28/2022
by   Aviral Kumar, et al.
0

The potential of offline reinforcement learning (RL) is that high-capacity models trained on large, heterogeneous datasets can lead to agents that generalize broadly, analogously to similar advances in vision and NLP. However, recent works argue that offline RL methods encounter unique challenges to scaling up model capacity. Drawing on the learnings from these works, we re-examine previous design choices and find that with appropriate choices: ResNets, cross-entropy based distributional backups, and feature normalization, offline Q-learning algorithms exhibit strong performance that scales with model capacity. Using multi-task Atari as a testbed for scaling and generalization, we train a single policy on 40 games with near-human performance using up-to 80 million parameter networks, finding that model performance scales favorably with capacity. In contrast to prior work, we extrapolate beyond dataset performance even when trained entirely on a large (400M transitions) but highly suboptimal dataset (51 return-conditioned supervised approaches, offline Q-learning scales similarly with model capacity and has better performance, especially when the dataset is suboptimal. Finally, we show that offline Q-learning with a diverse dataset is sufficient to learn powerful representations that facilitate rapid transfer to novel games and fast online learning on new variations of a training game, improving over existing state-of-the-art representation learning approaches.

READ FULL TEXT
research
09/16/2021

Conservative Data Sharing for Multi-Task Offline Reinforcement Learning

Offline reinforcement learning (RL) algorithms have shown promising resu...
research
11/02/2022

Behavior Prior Representation learning for Offline Reinforcement Learning

Offline reinforcement learning (RL) struggles in environments with rich ...
research
06/09/2022

Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations

Offline reinforcement learning has shown great promise in leveraging lar...
research
05/30/2022

Multi-Game Decision Transformers

A longstanding goal of the field of AI is a strategy for compiling diver...
research
05/29/2023

Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning

Diffusion models have demonstrated highly-expressive generative capabili...
research
06/24/2023

Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets

Despite the recent advancements in offline reinforcement learning via su...

Please sign up or login with your details

Forgot password? Click here to reset