Lucid Dreaming for Experience Replay: Refreshing Past States with the Current Policy

by   Yunshu Du, et al.

Experience replay (ER) improves the data efficiency of off-policy reinforcement learning (RL) algorithms by allowing an agent to store and reuse its past experiences in a replay buffer. While many techniques have been proposed to enhance ER by biasing how experiences are sampled from the buffer, thus far they have not considered strategies for refreshing experiences inside the buffer. In this work, we introduce Lucid Dreaming for Experience Replay (LiDER), a conceptually new framework that allows replay experiences to be refreshed by leveraging the agent's current policy. LiDER 1) moves an agent back to a past state; 2) lets the agent try following its current policy to execute different actions—as if the agent were "dreaming" about the past, but is aware of the situation and can control the dream to encounter new experiences; and 3) stores and reuses the new experience if it turned out better than what the agent previously experienced, i.e., to refresh its memories. LiDER is designed to be easily incorporated into off-policy, multi-worker RL algorithms that use ER; we present in this work a case study of applying LiDER to an actor-critic based algorithm. Results show LiDER consistently improves performance over the baseline in four Atari 2600 games. Our open-source implementation of LiDER and the data used to generate all plots in this paper are available at


page 1

page 2

page 3

page 4


Experience Replay Optimization

Experience replay enables reinforcement learning agents to memorize and ...

Which Experiences Are Influential for Your Agent? Policy Iteration with Turn-over Dropout

In reinforcement learning (RL) with experience replay, experiences store...

Remember and Forget for Experience Replay

Experience replay (ER) is crucial for attaining high data-efficiency in ...

Experience Replay with Likelihood-free Importance Weights

The use of past experiences to accelerate temporal difference (TD) learn...

Contrastive Initial State Buffer for Reinforcement Learning

In Reinforcement Learning, the trade-off between exploration and exploit...

Reverb: A Framework For Experience Replay

A central component of training in Reinforcement Learning (RL) is Experi...

Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment

This paper proposes a method for prioritizing the replay experience refe...

Please sign up or login with your details

Forgot password? Click here to reset