Safe and Robust Experience Sharing for Deterministic Policy Gradient Algorithms

07/27/2022
by   Baturay Saglam, et al.
0

Learning in high dimensional continuous tasks is challenging, mainly when the experience replay memory is very limited. We introduce a simple yet effective experience sharing mechanism for deterministic policies in continuous action domains for the future off-policy deep reinforcement learning applications in which the allocated memory for the experience replay buffer is limited. To overcome the extrapolation error induced by learning from other agents' experiences, we facilitate our algorithm with a novel off-policy correction technique without any action probability estimates. We test the effectiveness of our method in challenging OpenAI Gym continuous control tasks and conclude that it can achieve a safe experience sharing across multiple agents and exhibits a robust performance when the replay memory is strictly limited.

READ FULL TEXT

page 13

page 14

page 15

research
11/02/2021

Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms via Batch Prioritized Experience Replay

The experience replay mechanism allows agents to use the experiences mul...
research
05/18/2022

Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks

Experience replay plays a crucial role in improving the sample efficienc...
research
07/30/2022

Reinforcement learning with experience replay and adaptation of action dispersion

Effective reinforcement learning requires a proper balance of exploratio...
research
04/23/2018

Distributed Distributional Deterministic Policy Gradients

This work adopts the very successful distributional perspective on reinf...
research
03/03/2023

Eventual Discounting Temporal Logic Counterfactual Experience Replay

Linear temporal logic (LTL) offers a simplified way of specifying tasks ...
research
03/04/2021

Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings

Recent advances in off-policy deep reinforcement learning (RL) have led ...
research
12/08/2021

Replay For Safety

Experience replay <cit.> is a widely used technique to achieve efficient...

Please sign up or login with your details

Forgot password? Click here to reset