b'Ryan Carey'

research

∙ 05/31/2023

Human Control: Definitions and Algorithms

How can humans stay in control of advanced artificial intelligence syste...

0 Ryan Carey, et al. ∙

research

∙ 01/05/2023

Reasoning about Causality in Games

Causal reasoning and game-theoretic reasoning are fundamental topics in ...

0 Lewis Hammond, et al. ∙

research

∙ 04/21/2022

Path-Specific Objectives for Safer Agent Incentives

We present a general framework for training safe agents whose naive ince...

0 Sebastian Farquhar, et al. ∙

research

∙ 04/05/2022

Too Big to Fail? Active Few-Shot Learning Guided Logic Synthesis

Generating sub-optimal synthesis transformation sequences ("synthesis re...

0 Animesh Basak Chowdhury, et al. ∙

research

∙ 02/23/2022

A Complete Criterion for Value of Information in Soluble Influence Diagrams

Influence diagrams have recently been used to analyse the safety and fai...

0 Chris van Merwijk, et al. ∙

research

∙ 02/22/2022

Why Fair Labels Can Yield Unfair Predictions: Graphical Conditions for Introduced Unfairness

In addition to reproducing discriminatory relationships in the training ...

0 Carolyn Ashurst, et al. ∙

research

∙ 02/02/2021

Agent Incentives: A Causal Perspective

We present a framework for analysing agent incentives using causal influ...

14 Tom Everitt, et al. ∙

research

∙ 01/20/2020

The Incentives that Shape Behaviour

Which variables does an agent have an incentive to control with its deci...

9 Ryan Carey, et al. ∙

research

∙ 11/11/2019

(When) Is Truth-telling Favored in AI Debate?

For some problems, humans may not be able to accurately judge the goodne...

0 Vojtěch Kovařík, et al. ∙

research

∙ 09/19/2017

Incorrigibility in the CIRL Framework

A value learning system has incentives to follow shutdown instructions, ...

0 Ryan Carey, et al. ∙

Ryan Carey

Featured Co-authors

Sign in with Google

Consider DeepAI Pro