Protecting Society from AI Misuse: When are Restrictions on Capabilities Warranted?

by   Markus Anderljung, et al.

Artificial intelligence (AI) systems will increasingly be used to cause harm as they grow more capable. In fact, AI systems are already starting to be used to automate fraudulent activities, violate human rights, create harmful fake images, and identify dangerous toxins. To prevent some misuses of AI, we argue that targeted interventions on certain capabilities will be warranted. These restrictions may include controlling who can access certain types of AI models, what they can be used for, whether outputs are filtered or can be traced back to their user, and the resources needed to develop them. We also contend that some restrictions on non-AI capabilities needed to cause harm will be required. Though capability restrictions risk reducing use more than misuse (facing an unfavorable Misuse-Use Tradeoff), we argue that interventions on capabilities are warranted when other interventions are insufficient, the potential harm from misuse is high, and there are targeted ways to intervene on capabilities. We provide a taxonomy of interventions that can reduce AI misuse, focusing on the specific steps required for a misuse to cause harm (the Misuse Chain), and a framework to determine if an intervention is warranted. We apply this reasoning to three examples: predicting novel toxins, creating harmful images, and automating spear phishing campaigns.


page 1

page 2

page 3

page 4


Structured access to AI capabilities: an emerging paradigm for safe AI deployment

Structured capability access ("SCA") is an emerging paradigm for the saf...

Circumventing interpretability: How to defeat mind-readers

The increasing capabilities of artificial intelligence (AI) systems make...

Toward AI Assistants That Let Designers Design

AI for supporting designers needs to be rethought. It should aim to coop...

Five Ps: Leverage Zones Towards Responsible AI

There is a growing debate amongst academics and practitioners on whether...

Evaluating GPT's Programming Capability through CodeWars' Katas

In the burgeoning field of artificial intelligence (AI), understanding t...

Testing Human Ability To Detect Deepfake Images of Human Faces

Deepfakes are computationally-created entities that falsely represent re...

Learning to Prompt in the Classroom to Understand AI Limits: A pilot study

Artificial intelligence's progress holds great promise in assisting soci...

Please sign up or login with your details

Forgot password? Click here to reset