Model Based Meta Learning of Critics for Policy Gradients

04/05/2022
by   Sarah Bechtle, et al.
0

Being able to seamlessly generalize across different tasks is fundamental for robots to act in our world. However, learning representations that generalize quickly to new scenarios is still an open research problem in reinforcement learning. In this paper we present a framework to meta-learn the critic for gradient-based policy learning. Concretely, we propose a model-based bi-level optimization algorithm that updates the critics parameters such that the policy that is learned with the updated critic gets closer to solving the meta-training tasks. We illustrate that our algorithm leads to learned critics that resemble the ground truth Q function for a given task. Finally, after meta-training, the learned critic can be used to learn new policies for new unseen task and environment settings via model-free policy gradient optimization, without requiring a model. We present results that show the generalization capabilities of our learned critic to new tasks and dynamics when used to learn a new policy in a new scenario.

READ FULL TEXT
research
09/14/2018

Model-Based Reinforcement Learning via Meta-Policy Optimization

Model-based reinforcement learning approaches carry the promise of being...
research
05/20/2023

On First-Order Meta-Reinforcement Learning with Moreau Envelopes

Meta-Reinforcement Learning (MRL) is a promising framework for training ...
research
02/16/2021

Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models

Reinforcement learning is a promising paradigm for solving sequential de...
research
05/16/2020

Model-Augmented Actor-Critic: Backpropagating through Paths

Current model-based reinforcement learning approaches use the model simp...
research
10/22/2019

Bottom-Up Meta-Policy Search

Despite of the recent progress in agents that learn through interaction,...
research
06/01/2023

What model does MuZero learn?

Model-based reinforcement learning has drawn considerable interest in re...
research
07/11/2019

A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning

The aim of multi-task reinforcement learning is two-fold: (1) efficientl...

Please sign up or login with your details

Forgot password? Click here to reset