Accelerated Quality-Diversity for Robotics through Massive Parallelism

by   Bryan Lim, et al.

Quality-Diversity (QD) algorithms are a well-known approach to generate large collections of diverse and high-quality policies. However, QD algorithms are also known to be data-inefficient, requiring large amounts of computational resources and are slow when used in practice for robotics tasks. Policy evaluations are already commonly performed in parallel to speed up QD algorithms but have limited capabilities on a single machine as most physics simulators run on CPUs. With recent advances in simulators that run on accelerators, thousands of evaluations can performed in parallel on single GPU/TPU. In this paper, we present QDax, an implementation of MAP-Elites which leverages massive parallelism on accelerators to make QD algorithms more accessible. We first demonstrate the improvements on the number of evaluations per second that parallelism using accelerated simulators can offer. More importantly, we show that QD algorithms are ideal candidates and can scale with massive parallelism to be run at interactive timescales. The increase in parallelism does not significantly affect the performance of QD algorithms, while reducing experiment runtimes by two factors of magnitudes, turning days of computation into minutes. These results show that QD can now benefit from hardware acceleration, which contributed significantly to the bloom of deep learning.


page 4

page 7


evosax: JAX-based Evolution Strategies

The deep learning revolution has greatly been accelerated by the 'hardwa...

PyTorch Distributed: Experiences on Accelerating Data Parallel Training

This paper presents the design, implementation, and evaluation of the Py...

Multiple Hands Make Light Work: Enhancing Quality and Diversity using MAP-Elites with Multiple Parallel Evolution Strategies

With the development of hardware accelerators and their corresponding to...

Using Hierarchical Parallelism to Accelerate the Solution of Many Small Partial Differential Equations

This paper presents efforts to improve the hierarchical parallelism of a...

Going faster to see further: GPU-accelerated value iteration and simulation for perishable inventory control using JAX

Value iteration can find the optimal replenishment policy for a perishab...

Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning

In this work, we present and study a training set-up that achieves fast ...

Synthesizing Optimal Parallelism Placement and Reduction Strategies on Hierarchical Systems for Deep Learning

We present a novel characterization of the mapping of multiple paralleli...

Please sign up or login with your details

Forgot password? Click here to reset