Effective Diversity in Population-Based Reinforcement Learning

02/03/2020
by   Jack Parker-Holder, et al.
18

Maintaining a population of solutions has been shown to increase exploration in reinforcement learning, typically attributed to the greater diversity of behaviors considered. One such class of methods, novelty search, considers further boosting diversity across agents via a multi-objective optimization formulation. Despite the intuitive appeal, these mechanisms have several shortcomings. First, they make use of mean field updates, which induce cycling behaviors. Second, they often rely on handcrafted behavior characterizations, which require domain knowledge. Furthermore, boosting diversity often has a detrimental impact on optimizing already fruitful behaviors for rewards. Setting the relative importance of novelty- versus reward-factor is usually hardcoded or requires tedious tuning/annealing. In this paper, we introduce a novel measure of population-wide diversity, leveraging ideas from Determinantal Point Processes. We combine this in a principled fashion with the reward function to adapt to the degree of diversity during training, borrowing ideas from online learning. Combined with task-agnostic behavioral embeddings, we show this approach outperforms previous methods for multi-objective optimization, as well as vanilla algorithms solely optimizing for rewards.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2018

Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning

Many reinforcement-learning researchers treat the reward function as a p...
research
10/15/2021

Effects of Different Optimization Formulations in Evolutionary Reinforcement Learning on Diverse Behavior Generation

Generating various strategies for a given task is challenging. However, ...
research
03/02/2022

Learning in Sparse Rewards settings through Quality-Diversity algorithms

In the Reinforcement Learning (RL) framework, the learning is guided thr...
research
03/10/2018

Enhanced Optimization with Composite Objectives and Novelty Selection

An important benefit of multi-objective search is that it maintains a di...
research
10/06/2021

From STL Rulebooks to Rewards

The automatic synthesis of neural-network controllers for autonomous age...
research
11/22/2022

Efficient Exploration using Model-Based Quality-Diversity with Gradients

Exploration is a key challenge in Reinforcement Learning, especially in ...
research
08/24/2023

Predator-prey survival pressure is sufficient to evolve swarming behaviors

The comprehension of how local interactions arise in global collective b...

Please sign up or login with your details

Forgot password? Click here to reset