Accelerating exploration and representation learning with offline pre-training

by   Bogdan Mazoure, et al.

Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most reinforcement learning (RL) algorithms address this challenge by improved credit assignment, introducing memory capability, altering the agent's intrinsic motivation (i.e. exploration) or its worldview (i.e. knowledge representation). Many of these components could be learned from offline data. In this work, we follow the hypothesis that exploration and representation learning can be improved by separately learning two different models from a single offline dataset. We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward separately from a single collection of human demonstrations can significantly improve the sample efficiency on the challenging NetHack benchmark. We also ablate various components of our experimental setting and highlight crucial insights.


The Challenges of Exploration for Offline Reinforcement Learning

Offline Reinforcement Learning (ORL) enablesus to separately study the t...

Offline Reinforcement Learning as Anti-Exploration

Offline Reinforcement Learning (RL) aims at learning an optimal control ...

Wish you were here: Hindsight Goal Selection for long-horizon dexterous manipulation

Complex sequential tasks in continuous-control settings often require ag...

Show me the Way: Intrinsic Motivation from Demonstrations

The study of exploration in Reinforcement Learning (RL) has a long histo...

OPEn: An Open-ended Physics Environment for Learning Without a Task

Humans have mental models that allow them to plan, experiment, and reaso...

Contrastive introspection (ConSpec) to rapidly identify invariant steps for success

Reinforcement learning (RL) algorithms have achieved notable success in ...

A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning

Representation learning lies at the heart of the empirical success of de...

Please sign up or login with your details

Forgot password? Click here to reset