Don't do it: Safer Reinforcement Learning With Rule-based Guidance

12/28/2022
by   Ekaterina Nikonova, et al.
0

During training, reinforcement learning systems interact with the world without considering the safety of their actions. When deployed into the real world, such systems can be dangerous and cause harm to their surroundings. Often, dangerous situations can be mitigated by defining a set of rules that the system should not violate under any conditions. For example, in robot navigation, one safety rule would be to avoid colliding with surrounding objects and people. In this work, we define safety rules in terms of the relationships between the agent and objects and use them to prevent reinforcement learning systems from performing potentially harmful actions. We propose a new safe epsilon-greedy algorithm that uses safety rules to override agents' actions if they are considered to be unsafe. In our experiments, we show that a safe epsilon-greedy policy significantly increases the safety of the agent during training, improves the learning efficiency resulting in much faster convergence, and achieves better performance than the base model.

READ FULL TEXT

page 4

page 5

page 6

page 7

research
11/10/2021

Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention

Safety has become one of the main challenges of applying deep reinforcem...
research
07/18/2022

Boolean Decision Rules for Reinforcement Learning Policy Summarisation

Explainability of Reinforcement Learning (RL) policies remains a challen...
research
03/17/2021

Weakly Supervised Reinforcement Learning for Autonomous Highway Driving via Virtual Safety Cages

The use of neural networks and reinforcement learning has become increas...
research
05/17/2022

Moral reinforcement learning using actual causation

Reinforcement learning systems will to a greater and greater extent make...
research
07/09/2019

Let's Keep It Safe: Designing User Interfaces that Allow Everyone to Contribute to AI Safety

When AI systems are granted the agency to take impactful actions in the ...
research
05/25/2020

Policy Entropy for Out-of-Distribution Classification

One critical prerequisite for the deployment of reinforcement learning s...
research
04/21/2023

Approximate Shielding of Atari Agents for Safe Exploration

Balancing exploration and conservatism in the constrained setting is an ...

Please sign up or login with your details

Forgot password? Click here to reset