Victoria Krakovna

research

∙ 04/13/2023

Power-seeking can be probable and predictive for trained agents

Power-seeking behavior is a key source of risk from advanced AI, but our...

0 Victoria Krakovna, et al. ∙

research

∙ 10/04/2022

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

The field of AI alignment is concerned with AI systems that pursue unint...

0 Rohin Shah, et al. ∙

research

∙ 11/17/2020

Avoiding Tampering Incentives in Deep RL via Decoupled Approval

How can we design agents that pursue a given objective when all feedback...

5 Jonathan Uesato, et al. ∙

research

∙ 11/17/2020

REALab: An Embedded Perspective on Tampering

This paper describes REALab, a platform for embedded agency research in ...

5 Ramana Kumar, et al. ∙

research

∙ 10/15/2020

Avoiding Side Effects By Considering Future Tasks

Designing reward functions is difficult: the designer has to specify wha...

22 Victoria Krakovna, et al. ∙

research

∙ 06/20/2019

Modeling AGI Safety Frameworks with Causal Influence Diagrams

Proposals for safe AGI systems are typically made at the level of framew...

2 Tom Everitt, et al. ∙

research

∙ 06/04/2018

Measuring and avoiding side effects using relative reachability

How can we design reinforcement learning agents that avoid causing unnec...

2 Victoria Krakovna, et al. ∙

research

∙ 11/27/2017

AI Safety Gridworlds

We present a suite of reinforcement learning environments illustrating v...

0 Jan Leike, et al. ∙

research

∙ 05/23/2017

Reinforcement Learning with a Corrupted Reward Channel

No real-world reward function is perfect. Sensory errors and software bu...

0 Tom Everitt, et al. ∙

Victoria Krakovna

Featured Co-authors

Sign in with Google

Consider DeepAI Pro