Mitigating Planner Overfitting in Model-Based Reinforcement Learning

12/03/2018
by   Dilip Arumugam, et al.
0

An agent with an inaccurate model of its environment faces a difficult choice: it can ignore the errors in its model and act in the real world in whatever way it determines is optimal with respect to its model. Alternatively, it can take a more conservative stance and eschew its model in favor of optimizing its behavior solely via real-world interaction. This latter approach can be exceedingly slow to learn from experience, while the former can lead to "planner overfitting" - aspects of the agent's behavior are optimized to exploit errors in its model. This paper explores an intermediate position in which the planner seeks to avoid overfitting through a kind of regularization of the plans it considers. We present three different approaches that demonstrably mitigate planner overfitting in reinforcement-learning environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2020

Learning visual servo policies via planner cloning

Learning control policies for visual servoing in novel environments is a...
research
02/28/2020

Policy-Aware Model Learning for Policy Gradient Methods

This paper considers the problem of learning a model in model-based rein...
research
03/11/2019

Learning to Paint with Model-based Deep Reinforcement Learning

We show how to teach machines to paint like human painters, who can use ...
research
02/02/2020

Integrating Deep Reinforcement Learning with Model-based Path Planners for Automated Driving

Automated driving in urban settings is challenging chiefly due to the in...
research
11/28/2017

One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay

Recently, model-free reinforcement learning algorithms have been shown t...
research
02/07/2019

Multimodal Conditional Learning with Fast Thinking Policy-like Model and Slow Thinking Planner-like Model

This paper studies the supervised learning of the conditional distributi...
research
02/12/2020

Deep compositional robotic planners that follow natural language commands

We demonstrate how a sampling-based robotic planner can be augmented to ...

Please sign up or login with your details

Forgot password? Click here to reset