When to use parametric models in reinforcement learning?

06/12/2019
by   Hado van Hasselt, et al.
0

We examine the question of when and how parametric models are most useful in reinforcement learning. In particular, we look at commonalities and differences between parametric models and experience replay. Replay-based learning algorithms share important traits with model-based approaches, including the ability to plan: to use more computation without additional data to improve predictions and behaviour. We discuss when to expect benefits from either approach, and interpret prior work in this context. We hypothesise that, under suitable conditions, replay-based algorithms should be competitive to or better than model-based algorithms if the model is used only to generate fictional transitions from observed states for an update rule that is otherwise model-free. We validated this hypothesis on Atari 2600 video games. The replay-based algorithm attained state-of-the-art data efficiency, improving over prior results with parametric models.

READ FULL TEXT
research
02/20/2023

Understanding the effect of varying amounts of replay per step

Model-based reinforcement learning uses models to plan, where the predic...
research
10/05/2021

Imaginary Hindsight Experience Replay: Curious Model-based Learning for Sparse Reward Tasks

Model-based reinforcement learning is a promising learning strategy for ...
research
06/08/2020

Maximum Entropy Model Rollouts: Fast Model Based Policy Optimization without Compounding Errors

Model usage is the central challenge of model-based reinforcement learni...
research
06/28/2023

Curious Replay for Model-based Adaptation

Agents must be able to adapt quickly as an environment changes. We find ...
research
11/04/2022

The Benefits of Model-Based Generalization in Reinforcement Learning

Model-Based Reinforcement Learning (RL) is widely believed to have the p...
research
09/26/2022

Paused Agent Replay Refresh

Reinforcement learning algorithms have become more complex since the inv...
research
02/15/2018

Prioritized Sweeping Neural DynaQ with Multiple Predecessors, and Hippocampal Replays

During sleep and awake rest, the hippocampus replays sequences of place ...

Please sign up or login with your details

Forgot password? Click here to reset