Explanation through Reward Model Reconciliation using POMDP Tree Search

05/01/2023
by   Benjamin D. Kraske, et al.
0

As artificial intelligence (AI) algorithms are increasingly used in mission-critical applications, promoting user-trust of these systems will be essential to their success. Ensuring users understand the models over which algorithms reason promotes user trust. This work seeks to reconcile differences between the reward model that an algorithm uses for online partially observable Markov decision (POMDP) planning and the implicit reward model assumed by a human user. Action discrepancies, differences in decisions made by an algorithm and user, are leveraged to estimate a user's objectives as expressed in weightings of a reward function.

READ FULL TEXT

page 2

page 4

research
03/09/2018

Intelligible Artificial Intelligence

Since Artificial Intelligence (AI) software uses techniques like deep lo...
research
08/15/2017

Towards Learning Reward Functions from User Interactions

In the physical world, people have dynamic preferences, e.g., the same s...
research
02/20/2020

A Model-Based, Decision-Theoretic Perspective on Automated Cyber Response

Cyber-attacks can occur at machine speeds that are far too fast for huma...
research
12/13/2019

AutoAIViz: Opening the Blackbox of Automated Artificial Intelligence with Conditional Parallel Coordinates

Artificial Intelligence (AI) can now automate the algorithm selection, f...
research
01/09/2021

Trust-Based Route Planning for Automated Vehicles

Several recent works consider the personalized route planning based on u...
research
07/04/2012

Existence and Finiteness Conditions for Risk-Sensitive Planning: Results and Conjectures

Decision-theoretic planning with risk-sensitive planning objectives is i...
research
04/27/2020

Tradeoff-Focused Contrastive Explanation for MDP Planning

End-users' trust in automated agents is important as automated decision-...

Please sign up or login with your details

Forgot password? Click here to reset