RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning

03/08/2023
by   Jasmina Gajcin, et al.
0

While reinforcement learning (RL) algorithms have been successfully applied to numerous tasks, their reliance on neural networks makes their behavior difficult to understand and trust. Counterfactual explanations are human-friendly explanations that offer users actionable advice on how to alter the model inputs to achieve the desired output from a black-box system. However, current approaches to generating counterfactuals in RL ignore the stochastic and sequential nature of RL tasks and can produce counterfactuals which are difficult to obtain or do not deliver the desired outcome. In this work, we propose RACCER, the first RL-specific approach to generating counterfactual explanations for the behaviour of RL agents. We first propose and implement a set of RL-specific counterfactual properties that ensure easily reachable counterfactuals with highly-probable desired outcomes. We use a heuristic tree search of agent's execution trajectories to find the most suitable counterfactuals based on the defined properties. We evaluate RACCER in two tasks as well as conduct a user study to show that RL-specific counterfactuals help users better understand agent's behavior compared to the current state-of-the-art approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

Counterfactual Explanations for Reinforcement Learning

While AI algorithms have shown remarkable success in various fields, the...
research
02/24/2023

GANterfactual-RL: Understanding Reinforcement Learning Agents' Strategies through Visual Counterfactual Explanations

Counterfactual explanations are a common tool to explain artificial inte...
research
07/23/2018

Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Machine Learning models become increasingly proficient in complex tasks....
research
10/10/2022

Experiential Explanations for Reinforcement Learning

Reinforcement Learning (RL) approaches are becoming increasingly popular...
research
08/03/2021

Accelerating the Convergence of Human-in-the-Loop Reinforcement Learning with Counterfactual Explanations

The capability to interactively learn from human feedback would enable r...
research
01/29/2022

Explaining Reinforcement Learning Policies through Counterfactual Trajectories

In order for humans to confidently decide where to employ RL agents for ...
research
12/01/2022

Decisions that Explain Themselves: A User-Centric Deep Reinforcement Learning Explanation System

With deep reinforcement learning (RL) systems like autonomous driving be...

Please sign up or login with your details

Forgot password? Click here to reset