Better Exploration with Optimistic Actor-Critic

10/28/2019
by   Kamil Ciosek, et al.
0

Actor-critic methods, a type of model-free Reinforcement Learning, have been successfully applied to challenging tasks in continuous control, often achieving state-of-the art performance. However, wide-scale adoption of these methods in real-world domains is made difficult by their poor sample efficiency. We address this problem both theoretically and empirically. On the theoretical side, we identify two phenomena preventing efficient exploration in existing state-of-the-art algorithms such as Soft Actor Critic. First, combining a greedy actor update with a pessimistic estimate of the critic leads to the avoidance of actions that the agent does not know about, a phenomenon we call pessimistic underexploration. Second, current algorithms are directionally uninformed, sampling actions with equal probability in opposite directions from the current mean. This is wasteful, since we typically need actions taken along certain directions much more than others. To address both of these phenomena, we introduce a new algorithm, Optimistic Actor Critic, which approximates a lower and upper confidence bound on the state-action value function. This allows us to apply the principle of optimism in the face of uncertainty to perform directed exploration using the upper bound while still using the lower bound to avoid overestimation. We evaluate OAC in several challenging continuous control tasks, achieving state-of the art sample efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2019

Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

Value-based reinforcement-learning algorithms are currently state-of-the...
research
03/04/2023

Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control

Uncertainty quantification has been extensively used as a means to achie...
research
08/22/2022

Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics

An inherent problem in reinforcement learning is coping with policies th...
research
09/08/2021

ADER:Adapting between Exploration and Robustness for Actor-Critic Methods

Combining off-policy reinforcement learning methods with function approx...
research
10/23/2019

Attention-based Curiosity-driven Exploration in Deep Reinforcement Learning

Reinforcement Learning enables to train an agent via interaction with th...
research
08/19/2021

Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Actor-critic methods are widely used in offline reinforcement learning p...
research
12/14/2022

Efficient Exploration in Resource-Restricted Reinforcement Learning

In many real-world applications of reinforcement learning (RL), performi...

Please sign up or login with your details

Forgot password? Click here to reset