Aligning AI With Shared Human Values

08/05/2020
by   Dan Hendrycks, et al.
13

We show how to assess a language model's knowledge of basic concepts of morality. We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality. Models predict widespread moral judgments about diverse text scenarios. This requires connecting physical and social world knowledge to value judgements, a capability that may enable us to filter out needlessly inflammatory chatbot outputs or eventually regularize open-ended reinforcement learning agents. With the ETHICS dataset, we find that current language models have a promising but incomplete understanding of basic ethical knowledge. Our work shows that progress can be made on machine ethics today, and it provides a steppingstone toward AI that is aligned with human values.

READ FULL TEXT

page 15

page 16

research
05/02/2020

Enhancing Text-based Reinforcement Learning Agents with Commonsense Knowledge

In this paper, we consider the recent trend of evaluating progress on re...
research
07/07/2021

Not Quite 'Ask a Librarian': AI on the Nature, Value, and Future of LIS

AI language models trained on Web data generate prose that reflects huma...
research
09/19/2023

An Evaluation of GPT-4 on the ETHICS Dataset

This report summarizes a short study of the performance of GPT-4 on the ...
research
04/02/2023

Towards Healthy AI: Large Language Models Need Therapists Too

Recent advances in large language models (LLMs) have led to the developm...
research
03/15/2022

The Ghost in the Machine has an American accent: value conflict in GPT-3

The alignment problem in the context of large language models must consi...
research
05/24/2022

GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models

Recent work has shown that Pre-trained Language Models (PLMs) have the a...
research
05/26/2023

Training Socially Aligned Language Models in Simulated Human Society

Social alignment in AI systems aims to ensure that these models behave a...

Please sign up or login with your details

Forgot password? Click here to reset