Reward-Respecting Subtasks for Model-Based Reinforcement Learning

by   Richard S. Sutton, et al.

To achieve the ambitious goals of artificial intelligence, reinforcement learning must include planning with a model of the world that is abstract in state and time. Deep learning has made progress in state abstraction, but, although the theory of time abstraction has been extensively developed based on the options framework, in practice options have rarely been used in planning. One reason for this is that the space of possible options is immense and the methods previously proposed for option discovery do not take into account how the option models will be used in planning. Options are typically discovered by posing subsidiary tasks such as reaching a bottleneck state, or maximizing a sensory signal other than the reward. Each subtask is solved to produce an option, and then a model of the option is learned and made available to the planning process. The subtasks proposed in most previous work ignore the reward on the original problem, whereas we propose subtasks that use the original reward plus a bonus based on a feature of the state at the time the option stops. We show that options and option models obtained from such reward-respecting subtasks are much more likely to be useful in planning and can be learned online and off-policy using existing learning algorithms. Reward respecting subtasks strongly constrain the space of options and thereby also provide a partial solution to the problem of option discovery. Finally, we show how the algorithms for learning values, policies, options, and models can be unified using general value functions.


page 7

page 8

page 12


Average-Reward Learning and Planning with Options

We extend the options framework for temporal abstraction in reinforcemen...

Multi-Task Option Learning and Discovery for Stochastic Path Planning

This paper addresses the problem of reliably and efficiently solving bro...

Eigenoption Discovery through the Deep Successor Representation

Options in reinforcement learning allow agents to hierarchically decompo...

Temporally Abstract Partial Models

Humans and animals have the ability to reason and make predictions about...

ODPP: A Unified Algorithm Framework for Unsupervised Option Discovery based on Determinantal Point Process

Learning rich skills through temporal abstractions without supervision o...

Option Discovery in the Absence of Rewards with Manifold Analysis

Options have been shown to be an effective tool in reinforcement learnin...

MO2: Model-Based Offline Options

The ability to discover useful behaviours from past experience and trans...

Please sign up or login with your details

Forgot password? Click here to reset