Regret Minimization Experience Replay

by   Zhenghai Xue, et al.

Experience replay is widely used in various deep off-policy reinforcement learning (RL) algorithms. It stores previously collected samples for further reuse. To better utilize these samples, prioritized sampling is a promising technique to improve the performance of RL agents. Previous prioritization methods based on temporal-difference (TD) error are highly heuristic and divergent from the objective of RL. In this work, we analyze the optimal prioritization strategy that can minimize the regret of RL policy theoretically. Our theory suggests that the data with higher TD error, better on-policiness and more corrective feedback should be assigned with higher weights during sampling. Based on this theory, we propose two practical algorithms, RM-DisCor and RM-TCE. RM-DisCor is a general algorithm and RM-TCE is a more efficient variant relying on the temporal ordering of states. Both algorithms improve the performance of off-policy RL algorithms in challenging RL benchmarks, including MuJoCo, Atari and Meta-World.


Stratified Experience Replay: Correcting Multiplicity Bias in Off-Policy Reinforcement Learning

Deep Reinforcement Learning (RL) methods rely on experience replay to ap...

MAC-PO: Multi-Agent Experience Replay via Collective Priority Optimization

Experience replay is crucial for off-policy reinforcement learning (RL) ...

Revisiting Fundamentals of Experience Replay

Experience replay is central to off-policy algorithms in deep reinforcem...

Learning to Sample with Local and Global Contexts in Experience Replay Buffer

Experience replay, which enables the agents to remember and reuse experi...

Offline Prioritized Experience Replay

Offline reinforcement learning (RL) is challenged by the distributional ...

Variance Reduction based Experience Replay for Policy Optimization

For reinforcement learning on complex stochastic systems where many fact...

Convergence Results For Q-Learning With Experience Replay

A commonly used heuristic in RL is experience replay (e.g. <cit.>), in w...

Please sign up or login with your details

Forgot password? Click here to reset