Understanding Stereotypes in Language Models: Towards Robust Measurement and Zero-Shot Debiasing

12/20/2022
by   Justus Mattern, et al.
0

Generated texts from large pretrained language models have been shown to exhibit a variety of harmful, human-like biases about various demographics. These findings prompted large efforts aiming to understand and measure such effects, with the goal of providing benchmarks that can guide the development of techniques mitigating these stereotypical associations. However, as recent research has pointed out, the current benchmarks lack a robust experimental setup, consequently hindering the inference of meaningful conclusions from their evaluation metrics. In this paper, we extend these arguments and demonstrate that existing techniques and benchmarks aiming to measure stereotypes tend to be inaccurate and consist of a high degree of experimental noise that severely limits the knowledge we can gain from benchmarking language models based on them. Accordingly, we propose a new framework for robustly measuring and quantifying biases exhibited by generative language models. Finally, we use this framework to investigate GPT-3's occupational gender bias and propose prompting techniques for mitigating these biases without the need for fine-tuning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2023

On Evaluating and Mitigating Gender Biases in Multilingual Settings

While understanding and removing gender biases in language models has be...
research
06/24/2021

Towards Understanding and Mitigating Social Biases in Language Models

As machine learning methods are deployed in real-world settings such as ...
research
08/21/2023

FairBench: A Four-Stage Automatic Framework for Detecting Stereotypes and Biases in Large Language Models

Detecting stereotypes and biases in Large Language Models (LLMs) can enh...
research
05/27/2023

Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark

Recent model editing techniques promise to mitigate the problem of memor...
research
12/20/2022

Geographic and Geopolitical Biases of Language Models

Pretrained language models (PLMs) often fail to fairly represent target ...
research
03/22/2022

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning

Vision-language models can encode societal biases and stereotypes, but t...
research
07/24/2023

Interpretable Stereotype Identification through Reasoning

Given that language models are trained on vast datasets that may contain...

Please sign up or login with your details

Forgot password? Click here to reset