Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models

06/16/2022
by   Maribeth Rauh, et al.
0

Large language models produce human-like text that drive a growing number of applications. However, recent literature and, increasingly, real world observations, have demonstrated that these models can generate language that is toxic, biased, untruthful or otherwise harmful. Though work to evaluate language model harms is under way, translating foresight about which harms may arise into rigorous benchmarks is not straightforward. To facilitate this translation, we outline six ways of characterizing harmful text which merit explicit consideration when designing new benchmarks. We then use these characteristics as a lens to identify trends and gaps in existing benchmarks. Finally, we apply them in a case study of the Perspective API, a toxicity classifier that is widely used in harm benchmarks. Our characteristics provide one piece of the bridge that translates between foresight and effective evaluation.

READ FULL TEXT

page 6

page 26

research
02/09/2020

Limits of Detecting Text Generated by Large-Scale Language Models

Some consider large-scale language models that can generate long and coh...
research
10/17/2022

Prompting GPT-3 To Be Reliable

Large language models (LLMs) show impressive abilities via few-shot prom...
research
09/16/2021

Do Language Models Know the Way to Rome?

The global geometry of language models is important for a range of appli...
research
05/23/2023

Enhancing Generation through Summarization Duality and Explicit Outline Control

Automatically open-ended long text generation poses significant challeng...
research
06/29/2023

Benchmarking Large Language Model Capabilities for Conditional Generation

Pre-trained large language models (PLMs) underlie most new developments ...
research
08/22/2023

Efficient Benchmarking (of Language Models)

The increasing versatility of language models LMs has given rise to a ne...
research
07/28/2022

Efficient Training of Language Models to Fill in the Middle

We show that autoregressive language models can learn to infill text aft...

Please sign up or login with your details

Forgot password? Click here to reset