SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

04/20/2022
by   Yannis Flet-Berliac, et al.
3

Although Reinforcement Learning (RL) is effective for sequential decision-making problems under uncertainty, it still fails to thrive in real-world systems where risk or safety is a binding constraint. In this paper, we formulate the RL problem with safety constraints as a non-zero-sum game. While deployed with maximum entropy RL, this formulation leads to a safe adversarially guided soft actor-critic framework, called SAAC. In SAAC, the adversary aims to break the safety constraint while the RL agent aims to maximize the constrained value function given the adversary's policy. The safety constraint on the agent's value function manifests only as a repulsion term between the agent's and the adversary's policies. Unlike previous approaches, SAAC can address different safety criteria such as safe exploration, mean-variance risk sensitivity, and CVaR-like coherent risk sensitivity. We illustrate the design of the adversary for these constraints. Then, in each of these variations, we show the agent differentiates itself from the adversary's unsafe actions in addition to learning to solve the task. Finally, for challenging continuous control tasks, we demonstrate that SAAC achieves faster convergence, better efficiency, and fewer failures to satisfy the safety constraints than risk-averse distributional RL and risk-neutral soft actor-critic algorithms.

READ FULL TEXT
research
03/07/2023

A Multiplicative Value Function for Safe and Efficient Reinforcement Learning

An emerging field of sequential decision problems is safe Reinforcement ...
research
02/13/2020

Improving Generalization of Reinforcement Learning with Minimax Distributional Soft Actor-Critic

Reinforcement learning (RL) has achieved remarkable performance in a var...
research
09/13/2023

Safe Reinforcement Learning with Dual Robustness

Reinforcement learning (RL) agents are vulnerable to adversarial disturb...
research
10/21/2021

Is High Variance Unavoidable in RL? A Case Study in Continuous Control

Reinforcement learning (RL) experiments have notoriously high variance, ...
research
12/19/2020

Model-Based Actor-Critic with Chance Constraint for Stochastic System

Safety constraints are essential for reinforcement learning (RL) applied...
research
01/26/2023

Efficient Trust Region-Based Safe Reinforcement Learning with Low-Bias Distributional Actor-Critic

To apply reinforcement learning (RL) to real-world applications, agents ...
research
12/01/2019

Adversary A3C for Robust Reinforcement Learning

Asynchronous Advantage Actor Critic (A3C) is an effective Reinforcement ...

Please sign up or login with your details

Forgot password? Click here to reset