DEFENDER: DTW-Based Episode Filtering Using Demonstrations for Enhancing RL Safety

05/08/2023
by   André Correia, et al.
0

Deploying reinforcement learning agents in the real world can be challenging due to the risks associated with learning through trial and error. We propose a task-agnostic method that leverages small sets of safe and unsafe demonstrations to improve the safety of RL agents during learning. The method compares the current trajectory of the agent with both sets of demonstrations at every step, and filters the trajectory if it resembles the unsafe demonstrations. We perform ablation studies on different filtering strategies and investigate the impact of the number of demonstrations on performance. Our method is compatible with any stand-alone RL algorithm and can be applied to any task. We evaluate our method on three tasks from OpenAI Gym's Mujoco benchmark and two state-of-the-art RL algorithms. The results demonstrate that our method significantly reduces the crash rate of the agent while converging to, and in most cases even improving, the performance of the stand-alone agent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2022

A State-Distribution Matching Approach to Non-Episodic Reinforcement Learning

While reinforcement learning (RL) provides a framework for learning thro...
research
02/24/2021

PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning

We study reinforcement learning (RL) with no-reward demonstrations, a se...
research
05/23/2023

GUARD: A Safe Reinforcement Learning Benchmark

Due to the trial-and-error nature, it is typically challenging to apply ...
research
12/01/2021

Wish you were here: Hindsight Goal Selection for long-horizon dexterous manipulation

Complex sequential tasks in continuous-control settings often require ag...
research
05/22/2023

Adaptive action supervision in reinforcement learning from real-world multi-agent demonstrations

Modeling of real-world biological multi-agents is a fundamental problem ...
research
04/12/2017

Deep Q-learning from Demonstrations

Deep reinforcement learning (RL) has achieved several high profile succe...
research
10/05/2020

Policy Learning Using Weak Supervision

Most existing policy learning solutions require the learning agents to r...

Please sign up or login with your details

Forgot password? Click here to reset