Planning by Prioritized Sweeping with Small Backups

01/10/2013
by   Harm van Seijen, et al.
0

Efficient planning plays a crucial role in model-based reinforcement learning. Traditionally, the main planning operation is a full backup based on the current estimates of the successor states. Consequently, its computation time is proportional to the number of successor states. In this paper, we introduce a new planning backup that uses only the current value of a single successor state and has a computation time independent of the number of successor states. This new backup, which we call a small backup, opens the door to a new class of model-based reinforcement learning methods that exhibit much finer control over their planning process than traditional methods. We empirically demonstrate that this increased flexibility allows for more efficient planning by showing that an implementation of prioritized sweeping based on small backups achieves a substantial performance improvement over classical implementations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2018

Learning abstract planning domains and mappings to real world perceptions

Most of the works on planning and learning, e.g., planning by (model bas...
research
11/13/2020

Critic PI2: Master Continuous Planning via Policy Improvement with Path Integrals and Deep Actor-Critic Reinforcement Learning

Constructing agents with planning capabilities has long been one of the ...
research
11/12/2019

Efficient Planning under Partial Observability with Unnormalized Q Functions and Spectral Learning

Learning and planning in partially-observable domains is one of the most...
research
06/03/2019

Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

In an effort to better understand the different ways in which the discou...
research
06/13/2012

Model-Based Bayesian Reinforcement Learning in Large Structured Domains

Model-based Bayesian reinforcement learning has generated significant in...
research
06/04/2019

KarNet: An Efficient Boolean Function Simplifier

Many approaches such as Quine-McCluskey algorithm, Karnaugh map solving,...
research
12/03/2019

Adaptive Online Planning for Continual Lifelong Learning

We study learning control in an online lifelong learning scenario, where...

Please sign up or login with your details

Forgot password? Click here to reset