How can humans stay in control of advanced artificial intelligence syste...
Causal reasoning and game-theoretic reasoning are fundamental topics in
...
We present a general framework for training safe agents whose naive
ince...
Generating sub-optimal synthesis transformation sequences ("synthesis
re...
Influence diagrams have recently been used to analyse the safety and fai...
In addition to reproducing discriminatory relationships in the training ...
We present a framework for analysing agent incentives using causal influ...
Which variables does an agent have an incentive to control with its deci...
For some problems, humans may not be able to accurately judge the goodne...
A value learning system has incentives to follow shutdown instructions,
...