Incorrigibility in the CIRL Framework

09/19/2017
by   Ryan Carey, et al.
0

A value learning system has incentives to follow shutdown instructions, assuming the shutdown instruction provides information (in the technical sense) about which actions lead to valuable outcomes. However, this assumption is not robust to model mis-specification (e.g., in the case of programmer errors). We demonstrate this by presenting some Supervised POMDP scenarios in which errors in the parameterized reward function remove the incentive to follow shutdown commands. These difficulties parallel those discussed by Soares et al. (2015) in their paper on corrigibility. We argue that it is important to consider systems that follow shutdown commands under some weaker set of assumptions (e.g., that one small verified module is correctly implemented; as opposed to an entire prior probability distribution and/or parameterized reward function). We discuss some difficulties with simple ways to attempt to attain these sorts of guarantees in a value learning framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2018

Learning to Follow Language Instructions with Adversarial Reward Induction

Recent work has shown that deep reinforcement-learning agents can learn ...
research
07/24/2020

Bayesian Robust Optimization for Imitation Learning

One of the main challenges in imitation learning is determining what act...
research
09/25/2020

Deep Reinforcement Learning with Stage Incentive Mechanism for Robotic Trajectory Planning

To improve the efficiency of deep reinforcement learning (DRL) based met...
research
05/09/2012

Regret-based Reward Elicitation for Markov Decision Processes

The specification of aMarkov decision process (MDP) can be difficult. Re...
research
03/23/2021

Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification

In the standard Markov decision process formalism, users specify tasks b...
research
02/05/2019

Interactively shaping robot behaviour with unlabeled human instructions

In this paper, we propose a framework that enables a human teacher to sh...

Please sign up or login with your details

Forgot password? Click here to reset