DeepAI AI Chat
Log In Sign Up

Unlocking the Power of Representations in Long-term Novelty-based Exploration

by   Alaa Saade, et al.

We introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space. By adapting classical clustering to the nonstationary setting of Deep RL, RECODE can efficiently track state visitation counts over thousands of episodes. We further propose a novel generalization of the inverse dynamics loss, which leverages masked transformer architectures for multi-step prediction; which in conjunction with RECODE achieves a new state-of-the-art in a suite of challenging 3D-exploration tasks in DM-Hard-8. RECODE also sets new state-of-the-art in hard exploration Atari games, and is the first agent to reach the end screen in "Pitfall!".


page 8

page 14

page 18

page 22


Clustered Reinforcement Learning

Exploration strategy design is one of the challenging problems in reinfo...

Exploration via Elliptical Episodic Bonuses

In recent years, a number of reinforcement learning (RL) methods have be...

A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs

Exploration in environments which differ across episodes has received in...

Unifying Count-Based Exploration and Intrinsic Motivation

We consider an agent's uncertainty about its environment and the problem...

Hashing Over Predicted Future Frames for Informed Exploration of Deep Reinforcement Learning

In reinforcement learning (RL) tasks, an efficient exploration mechanism...

Deeper & Sparser Exploration

We address the problem of efficient exploration by proposing a new meta ...

Predicting retrosynthetic pathways using a combined linguistic model and hyper-graph exploration strategy

We present an extension of our Molecular Transformer architecture combin...