The Option-Critic Architecture

09/16/2016
by   Pierre-Luc Bacon, et al.
0

Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, and without the need to provide any additional rewards or subgoals. Experimental results in both discrete and continuous environments showcase the flexibility and efficiency of the framework.

READ FULL TEXT

page 5

page 7

research
10/27/2018

Learning Abstract Options

Building systems that autonomously create temporal abstractions from dat...
research
12/31/2019

On the Role of Weight Sharing During Deep Option Learning

The options framework is a popular approach for building temporally exte...
research
11/30/2017

Learnings Options End-to-End for Continuous Action Tasks

We present new results on learning temporally extended actions for conti...
research
02/26/2019

The Termination Critic

In this work, we consider the problem of autonomously discovering behavi...
research
01/01/2020

Options of Interest: Temporal Abstraction with Interest Functions

Temporal abstraction refers to the ability of an agent to use behaviours...
research
12/04/2018

Natural Option Critic

The recently proposed option-critic architecture Bacon et al. provide a ...
research
10/18/2021

MDP Abstraction with Successor Features

Abstraction plays an important role for generalisation of knowledge and ...

Please sign up or login with your details

Forgot password? Click here to reset