The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning

by   Harm van Seijen, et al.

Deep model-based Reinforcement Learning (RL) has the potential to substantially improve the sample-efficiency of deep RL. While various challenges have long held it back, a number of papers have recently come out reporting success with deep model-based methods. This is a great development, but the lack of a consistent metric to evaluate such methods makes it difficult to compare various approaches. For example, the common single-task sample-efficiency metric conflates improvements due to model-based learning with various other aspects, such as representation learning, making it difficult to assess true progress on model-based RL. To address this, we introduce an experimental setup to evaluate model-based behavior of RL methods, inspired by work from neuroscience on detecting model-based behavior in humans and animals. Our metric based on this setup, the Local Change Adaptation (LoCA) regret, measures how quickly an RL method adapts to a local change in the environment. Our metric can identify model-based behavior, even if the method uses a poor representation and provides insight in how close a method's behavior is from optimal model-based behavior. We use our setup to evaluate the model-based behavior of MuZero on a variation of the classic Mountain Car task.


Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods

In recent years, a growing number of deep model-based reinforcement lear...

On Optimism in Model-Based Reinforcement Learning

The principle of optimism in the face of uncertainty is prevalent throug...

On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning

Reinforcement Learning (RL) algorithms can solve challenging control pro...

MBCAL: A Simple and Efficient Reinforcement Learning Method for Recommendation Systems

It has been widely regarded that only considering the immediate user fee...

What deep reinforcement learning tells us about human motor learning and vice-versa

Machine learning and specifically reinforcement learning (RL) has been e...

Learning and Understanding a Disentangled Feature Representation for Hidden Parameters in Reinforcement Learning

Hidden parameters are latent variables in reinforcement learning (RL) en...

On the Reliability and Generalizability of Brain-inspired Reinforcement Learning Algorithms

Although deep RL models have shown a great potential for solving various...

Please sign up or login with your details

Forgot password? Click here to reset