Objective Mismatch in Model-based Reinforcement Learning

by   Nathan Lambert, et al.

Model-based reinforcement learning (MBRL) has been shown to be a powerful framework for data-efficiently learning control of continuous tasks. Recent work in MBRL has mostly focused on using more advanced function approximators and planning schemes, with little development of the general framework. In this paper, we identify a fundamental issue of the standard MBRL framework – what we call the objective mismatch issue. Objective mismatch arises when one objective is optimized in the hope that a second, often uncorrelated, metric will also be optimized. In the context of MBRL, we characterize the objective mismatch between training the forward dynamics model w.r.t. the likelihood of the one-step ahead prediction, and the overall goal of improving performance on a downstream control task. For example, this issue can emerge with the realization that dynamics models effective for a specific task do not necessarily need to be globally accurate, and vice versa globally accurate models might not be sufficiently accurate locally to obtain good control performance on a specific task. In our experiments, we study this objective mismatch issue and demonstrate that the likelihood of one-step ahead predictions is not always correlated with control performance. This observation highlights a critical limitation in the MBRL framework which will require further research to be fully understood and addressed. We propose an initial method to mitigate the mismatch issue by re-weighting dynamics model training. Building on it, we conclude with a discussion about other potential directions of research for addressing this issue.


page 1

page 2

page 3

page 4

page 8

page 9

page 13

page 14


Goal-Aware Prediction: Learning to Model What Matters

Learned dynamics models combined with both planning and policy learning ...

Model-Advantage Optimization for Model-Based Reinforcement Learning

Model-based Reinforcement Learning (MBRL) algorithms have been tradition...

Value Gradient weighted Model-Based Reinforcement Learning

Model-based reinforcement learning (MBRL) is a sample efficient techniqu...

Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch

We study the inverse reinforcement learning (IRL) problem under the tran...

Discriminator Augmented Model-Based Reinforcement Learning

By planning through a learned dynamics model, model-based reinforcement ...

Don't miss the Mismatch: Investigating the Objective Function Mismatch for Unsupervised Representation Learning

Finding general evaluation metrics for unsupervised representation learn...

The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms

We propose a novel approach to addressing two fundamental challenges in ...

Please sign up or login with your details

Forgot password? Click here to reset