VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

10/18/2019
by   Luisa Zintgraf, et al.
66

Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent's uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncertainty directly during action selection. In a grid-world domain, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We also evaluate variBAD on MuJoCo domains widely used in meta-RL and show that it achieves higher return during training than existing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2023

ContraBAR: Contrastive Bayes-Adaptive Deep RL

In meta reinforcement learning (meta RL), an agent seeks a Bayes-optimal...
research
08/06/2020

Offline Meta Reinforcement Learning

Consider the following problem, which we term Offline Meta Reinforcement...
research
01/11/2021

Deep Interactive Bayesian Reinforcement Learning via Meta-Learning

Agents that interact with other agents often do not know a priori what t...
research
09/17/2021

Knowledge is reward: Learning optimal exploration by predictive reward cashing

There is a strong link between the general concept of intelligence and t...
research
01/01/2020

Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies

We propose and address a novel few-shot RL problem, where a task is char...
research
10/29/2021

Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning

Evaluating the performance of an ongoing policy plays a vital role in ma...
research
05/28/2019

Learning Efficient and Effective Exploration Policies with Counterfactual Meta Policy

A fundamental issue in reinforcement learning algorithms is the balance ...

Please sign up or login with your details

Forgot password? Click here to reset