Bag of Policies for Distributional Deep Exploration

08/03/2023
by   Asen Nachkov, et al.
0

Efficient exploration in complex environments remains a major challenge for reinforcement learning (RL). Compared to previous Thompson sampling-inspired mechanisms that enable temporally extended exploration, i.e., deep exploration, we focus on deep exploration in distributional RL. We develop here a general purpose approach, Bag of Policies (BoP), that can be built on top of any return distribution estimator by maintaining a population of its copies. BoP consists of an ensemble of multiple heads that are updated independently. During training, each episode is controlled by only one of the heads and the collected state-action pairs are used to update all heads off-policy, leading to distinct learning signals for each head which diversify learning and behaviour. To test whether optimistic ensemble method can improve on distributional RL as did on scalar RL, by e.g. Bootstrapped DQN, we implement the BoP approach with a population of distributional actor-critics using Bayesian Distributional Policy Gradients (BDPG). The population thus approximates a posterior distribution of return distributions along with a posterior distribution of policies. Another benefit of building upon BDPG is that it allows to analyze global posterior uncertainty along with local curiosity bonus simultaneously for exploration. As BDPG is already an optimistic method, this pairing helps to investigate if optimism is accumulatable in distributional RL. Overall BoP results in greater robustness and speed during learning as demonstrated by our experimental results on ALE Atari games.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/20/2021

Bayesian Distributional Policy Gradients

Distributional Reinforcement Learning (RL) maintains the entire probabil...
research
01/09/2020

Addressing Value Estimation Errors in Reinforcement Learning with a State-Action Return Distribution Function

In current reinforcement learning (RL) methods, function approximation e...
research
06/11/2018

The Potential of the Return Distribution for Exploration in RL

This paper studies the potential of the return distribution for explorat...
research
09/29/2022

How Does Value Distribution in Distributional Reinforcement Learning Help Optimization?

We consider the problem of learning a set of probability distributions f...
research
05/13/2019

Distributional Reinforcement Learning for Efficient Exploration

In distributional reinforcement learning (RL), the estimated distributio...
research
07/10/2019

Striving for Simplicity in Off-policy Deep Reinforcement Learning

Reflecting on the advances of off-policy deep reinforcement learning (RL...
research
06/29/2022

Cyclical Kernel Adaptive Metropolis

We propose cKAM, cyclical Kernel Adaptive Metropolis, which incorporates...

Please sign up or login with your details

Forgot password? Click here to reset