Learning to Combat Compounding-Error in Model-Based Reinforcement Learning

12/24/2019
by   Chenjun Xiao, et al.
0

Despite its potential to improve sample complexity versus model-free approaches, model-based reinforcement learning can fail catastrophically if the model is inaccurate. An algorithm should ideally be able to trust an imperfect model over a reasonably long planning horizon, and only rely on model-free updates when the model errors get infeasibly large. In this paper, we investigate techniques for choosing the planning horizon on a state-dependent basis, where a state's planning horizon is determined by the maximum cumulative model error around that state. We demonstrate that these state-dependent model errors can be learned with Temporal Difference methods, based on a novel approach of temporally decomposing the cumulative model errors. Experimental results show that the proposed method can successfully adapt the planning horizon to account for state-dependent model accuracy, significantly improving the efficiency of policy learning compared to model-based and model-free baselines.

READ FULL TEXT
research
07/04/2018

Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

Integrating model-free and model-based approaches in reinforcement learn...
research
06/06/2022

Goal-Space Planning with Subgoal Models

This paper investigates a new approach to model-based reinforcement lear...
research
10/27/2020

γ-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction

We introduce the γ-model, a predictive model of environment dynamics wit...
research
02/21/2017

Sample Efficient Policy Search for Optimal Stopping Domains

Optimal stopping problems consider the question of deciding when to stop...
research
09/21/2020

Dynamic Horizon Value Estimation for Model-based Reinforcement Learning

Existing model-based value expansion methods typically leverage a world ...
research
09/25/2019

Model Imitation for Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) aims to learn a dynamic model ...
research
11/24/2020

C-Learning: Horizon-Aware Cumulative Accessibility Estimation

Multi-goal reaching is an important problem in reinforcement learning ne...

Please sign up or login with your details

Forgot password? Click here to reset