An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning

06/11/2018
by   Dhruv Malik, et al.
0

Our goal is for AI systems to correctly identify and act according to their human user's objectives. Cooperative Inverse Reinforcement Learning (CIRL) formalizes this value alignment problem as a two-player game between a human and robot, in which only the human knows the parameters of the reward function: the robot needs to learn them as the interaction unfolds. Previous work showed that CIRL can be solved as a POMDP, but with an action space size exponential in the size of the reward parameter space. In this work, we exploit a specific property of CIRL---the human is a full information agent---to derive an optimality-preserving modification to the standard Bellman update; this reduces the complexity of the problem by an exponential factor and allows us to relax CIRL's assumption of human rationality. We apply this update to a variety of POMDP solvers and find that it enables us to scale CIRL to non-trivial problems, with larger reward parameter spaces, and larger action spaces for both robot and human. In solutions to these larger problems, the human exhibits pedagogic (teaching) behavior, while the robot interprets it as such and attains higher value for the human.

READ FULL TEXT

page 14

page 17

page 18

research
06/09/2016

Cooperative Inverse Reinforcement Learning

For an autonomous system to be helpful to humans and to pose no unwarran...
research
09/11/2023

Effect of Adapting to Human Preferences on Trust in Human-Robot Teaming

We present the effect of adapting to human preferences on trust in a hum...
research
06/27/2019

Demonstration-Guided Deep Reinforcement Learning of Control Policies for Dexterous Human-Robot Interaction

In this paper, we propose a method for training control policies for hum...
research
10/08/2021

Explaining Reward Functions to Humans for Better Human-Robot Collaboration

Explainable AI techniques that describe agent reward functions can enhan...
research
02/25/2022

Towards neoRL networks; the emergence of purposive graphs

The neoRL framework for purposive AI implements latent learning by emula...
research
07/20/2017

Pragmatic-Pedagogic Value Alignment

For an autonomous system to provide value (e.g., to customers, designers...
research
02/02/2023

Goal Alignment: A Human-Aware Account of Value Alignment Problem

Value alignment problems arise in scenarios where the specified objectiv...

Please sign up or login with your details

Forgot password? Click here to reset