Bayesian Exploration Networks

by   Mattie Fellows, et al.

Bayesian reinforcement learning (RL) offers a principled and elegant approach for sequential decision making under uncertainty. Most notably, Bayesian agents do not face an exploration/exploitation dilemma, a major pathology of frequentist methods. A key challenge for Bayesian RL is the computational complexity of learning Bayes-optimal policies, which is only tractable in toy domains. In this paper we propose a novel model-free approach to address this challenge. Rather than modelling uncertainty in high-dimensional state transition distributions as model-based approaches do, we model uncertainty in a one-dimensional Bellman operator. Our theoretical analysis reveals that existing model-free approaches either do not propagate epistemic uncertainty through the MDP or optimise over a set of contextual policies instead of all history-conditioned policies. Both approximations yield policies that can be arbitrarily Bayes-suboptimal. To overcome these issues, we introduce the Bayesian exploration network (BEN) which uses normalising flows to model both the aleatoric uncertainty (via density estimation) and epistemic uncertainty (via variational inference) in the Bellman operator. In the limit of complete optimisation, BEN learns true Bayes-optimal policies, but like in variational expectation-maximisation, partial optimisation renders our approach tractable. Empirical results demonstrate that BEN can learn true Bayes-optimal policies in tasks where existing model-free approaches fail.


page 1

page 2

page 3

page 4


Bayesian Bellman Operators

We introduce a novel perspective on Bayesian reinforcement learning (RL)...

Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

Bayesian model-based reinforcement learning is a formally elegant approa...

Disentangling Epistemic and Aleatoric Uncertainty in Reinforcement Learning

Characterizing aleatoric and epistemic uncertainty on the predicted rewa...

Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

The goal of reinforcement learning (RL) is to let an agent learn an opti...

Bayes-Adaptive Deep Model-Based Policy Optimisation

We introduce a Bayesian (deep) model-based reinforcement learning method...

Importance Weighting Approach in Kernel Bayes' Rule

We study a nonparametric approach to Bayesian computation via feature me...

On Uncertainty in Deep State Space Models for Model-Based Reinforcement Learning

Improved state space models, such as Recurrent State Space Models (RSSMs...

Please sign up or login with your details

Forgot password? Click here to reset