Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments

by   Daochen Zha, et al.

Exploration under sparse reward is a long-standing challenge of model-free reinforcement learning. The state-of-the-art methods address this challenge by introducing intrinsic rewards to encourage exploration in novel states or uncertain environment dynamics. Unfortunately, methods based on intrinsic rewards often fall short in procedurally-generated environments, where a different environment is generated in each episode so that the agent is not likely to visit the same state more than once. Motivated by how humans distinguish good exploration behaviors by looking into the entire episode, we introduce RAPID, a simple yet effective episode-level exploration method for procedurally-generated environments. RAPID regards each episode as a whole and gives an episodic exploration score from both per-episode and long-term views. Those highly scored episodes are treated as good exploration behaviors and are stored in a small ranking buffer. The agent then imitates the episodes in the buffer to reproduce the past good exploration behaviors. We demonstrate our method on several procedurally-generated MiniGrid environments, a first-person-view 3D Maze navigation task from MiniWorld, and several sparse MuJoCo tasks. The results show that RAPID significantly outperforms the state-of-the-art intrinsic reward strategies in terms of sample efficiency and final performance. The code is available at https://github.com/daochenzha/rapid


page 3

page 5

page 16

page 17

page 18

page 19

page 20

page 22


RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments

Exploration in sparse reward environments remains one of the key challen...

Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning

Reinforcement learning with sparse rewards is still an open challenge. C...

Long-Term Exploration in Persistent MDPs

Exploration is an essential part of reinforcement learning, which restri...

MADE: Exploration via Maximizing Deviation from Explored Regions

In online reinforcement learning (RL), efficient exploration remains par...

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

While using shaped rewards can be beneficial when solving sparse reward ...

Multimodal Reward Shaping for Efficient Exploration in Reinforcement Learning

Maintaining long-term exploration ability remains one of the challenges ...

MULEX: Disentangling Exploitation from Exploration in Deep RL

An agent learning through interactions should balance its action selecti...

Please sign up or login with your details

Forgot password? Click here to reset