The Reactor: A Sample-Efficient Actor-Critic Architecture

04/15/2017
by   Audrūnas Gruslys, et al.
0

In this work we present a new reinforcement learning agent, called Reactor (for Retrace-actor), based on an off-policy multi-step return actor-critic architecture. The agent uses a deep recurrent neural network for function approximation. The network outputs a target policy π (the actor), an action-value Q-function (the critic) evaluating the current policy π, and an estimated behavioral policy μ̂ which we use for off-policy correction. The agent maintains a memory buffer filled with past experiences. The critic is trained by the multi-step off-policy Retrace algorithm and the actor is trained by a novel β-leave-one-out policy gradient estimate (which uses both the off-policy corrected return and the estimated Q-function). The Reactor is sample-efficient thanks to the use of memory replay, and numerical efficient since it uses multi-step returns. Also both acting and learning can be parallelized. We evaluated our algorithm on 57 Atari 2600 games and demonstrate that it achieves state-of-the-art performance.

READ FULL TEXT
research
02/21/2018

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

We present the first class of policy-gradient algorithms that work with ...
research
06/23/2020

The Effect of Multi-step Methods on Overestimation in Deep Reinforcement Learning

Multi-step (also called n-step) methods in reinforcement learning (RL) h...
research
11/26/2019

Dynamic Portfolio Management with Reinforcement Learning

Dynamic Portfolio Management is a domain that concerns the continuous re...
research
05/06/2021

Deep Graph Convolutional Reinforcement Learning for Financial Portfolio Management – DeepPocket

Portfolio management aims at maximizing the return on investment while m...
research
05/21/2017

Learning to Mix n-Step Returns: Generalizing lambda-Returns for Deep Reinforcement Learning

Reinforcement Learning (RL) can model complex behavior policies for goal...
research
06/16/2021

Solving Continuous Control with Episodic Memory

Episodic memory lets reinforcement learning algorithms remember and expl...
research
08/23/2022

An intelligent algorithmic trading based on a risk-return reinforcement learning algorithm

This scientific paper propose a novel portfolio optimization model using...

Please sign up or login with your details

Forgot password? Click here to reset