Parallel Actors and Learners: A Framework for Generating Scalable RL Implementations

by   Chi Zhang, et al.

Reinforcement Learning (RL) has achieved significant success in application domains such as robotics, games, health care and others. However, training RL agents is very time consuming. Current implementations exhibit poor performance due to challenges such as irregular memory accesses and synchronization overheads. In this work, we propose a framework for generating scalable reinforcement learning implementations on multicore systems. Replay Buffer is a key component of RL algorithms which facilitates storage of samples obtained from environmental interactions and their sampling for the learning process. We define a new data structure for prioritized replay buffer based on K-ary sum tree that supports asynchronous parallel insertions, sampling, and priority updates. To address the challenge of irregular memory accesses, we propose a novel data layout to store the nodes of the sum tree that reduces the number of cache misses. Additionally, we propose lazy writing mechanism to reduce synchronization overheads of the replay buffer. Our framework employs parallel actors to concurrently collect data via environmental interactions, and parallel learners to perform stochastic gradient descent using the collected data. Our framework supports a wide range of reinforcement learning algorithms including DQN, DDPG, TD3, SAC, etc. We demonstrate the effectiveness of our framework in accelerating RL algorithms by performing experiments on CPU + GPU platform using OpenAI benchmarks. Our results show that the performance of our approach scales linearly with the number of cores. Compared with the baseline approaches, we reduce the convergence time by 3.1x∼10.8x. By plugging our replay buffer implementation into existing open source reinforcement learning frameworks, we achieve 1.1x∼2.1x speedup for sequential executions.


page 1

page 8

page 9


Analysis of Stochastic Processes through Replay Buffers

Replay buffers are a key component in many reinforcement learning scheme...

Rethinking Population-assisted Off-policy Reinforcement Learning

While off-policy reinforcement learning (RL) algorithms are sample effic...

Virtual Replay Cache

Return caching is a recent strategy that enables efficient minibatch tra...

Event Tables for Efficient Experience Replay

Experience replay (ER) is a crucial component of many deep reinforcement...

An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets

Reinforcement Learning (RL) algorithms aim to learn an optimal policy by...

Reverb: A Framework For Experience Replay

A central component of training in Reinforcement Learning (RL) is Experi...

Collect Infer – a fresh look at data-efficient Reinforcement Learning

This position paper proposes a fresh look at Reinforcement Learning (RL)...

Please sign up or login with your details

Forgot password? Click here to reset