OPAC: Opportunistic Actor-Critic

by   Srinjoy Roy, et al.

Actor-critic methods, a type of model-free reinforcement learning (RL), have achieved state-of-the-art performances in many real-world domains in continuous control. Despite their success, the wide-scale deployment of these models is still a far cry. The main problems in these actor-critic methods are inefficient exploration and sub-optimal policies. Soft Actor-Critic (SAC) and Twin Delayed Deep Deterministic Policy Gradient (TD3), two cutting edge such algorithms, suffer from these issues. SAC effectively addressed the problems of sample complexity and convergence brittleness to hyper-parameters and thus outperformed all state-of-the-art algorithms including TD3 in harder tasks, whereas TD3 produced moderate results in all environments. SAC suffers from inefficient exploration owing to the Gaussian nature of its policy which causes borderline performance in simpler tasks. In this paper, we introduce Opportunistic Actor-Critic (OPAC), a novel model-free deep RL algorithm that employs better exploration policy and lesser variance. OPAC combines some of the most powerful features of TD3 and SAC and aims to optimize a stochastic policy in an off-policy way. For calculating the target Q-values, instead of two critics, OPAC uses three critics and based on the environment complexity, opportunistically chooses how the target Q-value is computed from the critics' evaluation. We have systematically evaluated the algorithm on MuJoCo environments where it achieves state-of-the-art performance and outperforms or at least equals the performance of TD3 and SAC.


page 1

page 2

page 3

page 4


Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Model-free deep reinforcement learning (RL) algorithms have been demonst...

High efficiency rl agent

Now a day, model free algorithm achieve state of art performance on many...

Generative Actor-Critic: An Off-policy Algorithm Using the Push-forward Model

Model-free deep reinforcement learning has achieved great success in man...

Leveraging exploration in off-policy algorithms via normalizing flows

Exploration is a crucial component for discovering approximately optimal...

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

Optimization of parameterized policies for reinforcement learning (RL) i...

Model-Free Reinforcement Learning for Asset Allocation

Asset allocation (or portfolio management) is the task of determining ho...

Fulfilling Formal Specifications ASAP by Model-free Reinforcement Learning

We propose a model-free reinforcement learning solution, namely the ASAP...

Please sign up or login with your details

Forgot password? Click here to reset