Parallel Q-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation

07/24/2023
by   Zechu Li, et al.
0

Reinforcement learning is time-consuming for complex tasks due to the need for large amounts of training data. Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU. Most prior works used on-policy methods like PPO due to their simplicity and ease of scaling. Off-policy methods are more data efficient but challenging to scale, resulting in a longer wall-clock training time. This paper presents a Parallel Q-Learning (PQL) scheme that outperforms PPO in wall-clock time while maintaining superior sample efficiency of off-policy learning. PQL achieves this by parallelizing data collection, policy learning, and value learning. Different from prior works on distributed off-policy learning, such as Apex, our scheme is designed specifically for massively parallel GPU-based simulation and optimized to work on a single workstation. In experiments, we demonstrate that Q-learning can be scaled to tens of thousands of parallel environments and investigate important factors affecting learning speed. The code is available at https://github.com/Improbable-AI/pql.

READ FULL TEXT

page 3

page 6

page 9

page 15

page 16

page 17

page 18

page 19

research
03/29/2023

Pgx: Hardware-accelerated parallel game simulation for reinforcement learning

We propose Pgx, a collection of board game simulators written in JAX. Th...
research
04/30/2020

Reinforcement Learning with Augmented Data

Learning from visual observations is a fundamental yet challenging probl...
research
01/18/2019

WALL-E: An Efficient Reinforcement Learning Research Framework

There are two halves to RL systems: experience collection time and polic...
research
06/13/2023

Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second

We present Galactic, a large-scale simulation and reinforcement-learning...
research
04/14/2022

Accelerated Policy Learning with Parallel Differentiable Simulation

Deep reinforcement learning can generate complex control policies, but r...
research
06/21/2020

Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning

Increasing the scale of reinforcement learning experiments has allowed r...
research
07/03/2023

A Parallel-In-Time Adjoint Sensitivity Analysis for a B6 Bridge-Motor Supply Circuit

This paper presents a parallel-in-time adjoint sensitivity analysis whic...

Please sign up or login with your details

Forgot password? Click here to reset