Reducing Sentiment Bias in Language Models via Counterfactual Evaluation

11/08/2019
by   Po-Sen Huang, et al.
0

Recent improvements in large-scale language models have driven progress on automatic generation of syntactically and semantically consistent text for many real-world applications. Many of these advances leverage the availability of large corpora. While training on such corpora encourages the model to understand long-range dependencies in text, it can also result in the models internalizing the social biases present in the corpora. This paper aims to quantify and reduce biases exhibited by language models. Given a conditioning context (e.g. a writing prompt) and a language model, we analyze if (and how) the sentiment of the generated text is affected by changes in values of sensitive attributes (e.g. country names, occupations, genders, etc.) in the conditioning context, a.k.a. counterfactual evaluation. We quantify these biases by adapting individual and group fairness metrics from the fair machine learning literature. Extensive evaluation on two different corpora (news articles and Wikipedia) shows that state-of-the-art Transformer-based language models exhibit biases learned from data. We propose embedding-similarity and sentiment-similarity regularization methods that improve both individual and group fairness metrics without sacrificing perplexity and semantic similarity—a positive step toward development and deployment of fairer language models for real-world applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2023

A Survey on Fairness in Large Language Models

Large language models (LLMs) have shown powerful performance and develop...
research
06/24/2021

Towards Understanding and Mitigating Social Biases in Language Models

As machine learning methods are deployed in real-world settings such as ...
research
08/03/2021

Improving Counterfactual Generation for Fair Hate Speech Detection

Bias mitigation approaches reduce models' dependence on sensitive featur...
research
10/24/2020

Fair Hate Speech Detection through Evaluation of Social Group Counterfactuals

Approaches for mitigating bias in supervised models are designed to redu...
research
12/20/2022

Geographic and Geopolitical Biases of Language Models

Pretrained language models (PLMs) often fail to fairly represent target ...
research
06/07/2023

Soft-prompt Tuning for Large Language Models to Evaluate Bias

Prompting large language models has gained immense popularity in recent ...
research
10/17/2022

Prompting GPT-3 To Be Reliable

Large language models (LLMs) show impressive abilities via few-shot prom...

Please sign up or login with your details

Forgot password? Click here to reset