Language Models (Mostly) Know What They Know

07/11/2022
by   Saurav Kadavath, et al.
12

We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly. We first show that larger models are well-calibrated on diverse multiple choice and true/false questions when they are provided in the right format. Thus we can approach self-evaluation on open-ended sampling tasks by asking models to first propose answers, and then to evaluate the probability "P(True)" that their answers are correct. We find encouraging performance, calibration, and scaling for P(True) on a diverse array of tasks. Performance at self-evaluation further improves when we allow models to consider many of their own samples before predicting the validity of one specific possibility. Next, we investigate whether models can be trained to predict "P(IK)", the probability that "I know" the answer to a question, without reference to any particular proposed answer. Models perform well at predicting P(IK) and partially generalize across tasks, though they struggle with calibration of P(IK) on new tasks. The predicted P(IK) probabilities also increase appropriately in the presence of relevant source materials in the context, and in the presence of hints towards the solution of mathematical word problems. We hope these observations lay the groundwork for training more honest models, and for investigating how honesty generalizes to cases where models are trained on objectives other than the imitation of human writing.

READ FULL TEXT

page 6

page 15

page 17

page 32

page 40

research
12/02/2020

How Can We Know When Language Models Know?

Recent works have shown that language models (LM) capture different type...
research
05/24/2023

Mastering the ABCDs of Complex Questions: Answer-Based Claim Decomposition for Fine-grained Self-Evaluation

When answering complex questions, large language models (LLMs) may produ...
research
03/21/2022

Teaching language models to support answers with verified quotes

Recent large language models often answer factual questions correctly. B...
research
02/19/2021

Calibrate Before Use: Improving Few-Shot Performance of Language Models

GPT-3 can perform numerous tasks when provided a natural language prompt...
research
09/08/2021

TruthfulQA: Measuring How Models Mimic Human Falsehoods

We propose a benchmark to measure whether a language model is truthful i...
research
05/24/2023

Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs

Have Large Language Models (LLMs) developed a personality? The short ans...
research
09/15/2023

Investigating the Applicability of Self-Assessment Tests for Personality Measurement of Large Language Models

As large language models (LLM) evolve in their capabilities, various rec...

Please sign up or login with your details

Forgot password? Click here to reset