Enter the Matrix: A Virtual World Approach to Safely Interruptable Autonomous Systems
Robots and autonomous systems that operate around humans will likely always rely on kill switches that stop their execution and allow them to be remote-controlled for the safety of humans or to prevent damage to the system. It is theoretically possible for an autonomous system with sufficient sensor and effector capability and using reinforcement learning to learn that the kill switch deprives it of long-term reward and learn to act to disable the switch or otherwise prevent a human operator from using the switch. This is referred to as the big red button problem. We present a technique which prevents a reinforcement learning agent from learning to disable the big red button. Our technique interrupts the agent or robot by placing it in a virtual simulation where it continues to receive reward. We illustrate our technique in a simple grid world environment.
READ FULL TEXT