Hate Speech Classifiers Learn Human-Like Social Stereotypes

10/28/2021
by   Aida Mostafazadeh Davani, et al.
0

Social stereotypes negatively impact individuals' judgements about different groups and may have a critical role in how people understand language directed toward minority social groups. Here, we assess the role of social stereotypes in the automated detection of hateful language by examining the relation between individual annotator biases and erroneous classification of texts by hate speech classifiers. Specifically, in Study 1 we investigate the impact of novice annotators' stereotypes on their hate-speech-annotation behavior. In Study 2 we examine the effect of language-embedded stereotypes on expert annotators' aggregated judgements in a large annotated corpus. Finally, in Study 3 we demonstrate how language-embedded stereotypes are associated with systematic prediction errors in a neural-network hate speech classifier. Our results demonstrate that hate speech classifiers learn human-like biases which can further perpetuate social inequalities when propagated at scale. This framework, combining social psychological and computational linguistic methods, provides insights into additional sources of bias in hate speech moderation, informing ongoing debates regarding fairness in machine learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2019

Racial Bias in Hate Speech and Abusive Language Detection Datasets

Technologies for abusive language detection are being developed and appl...
research
03/06/2020

A Framework for the Computational Linguistic Analysis of Dehumanization

Dehumanization is a pernicious psychological process that often leads to...
research
09/21/2023

How-to Guides for Specific Audiences: A Corpus and Initial Findings

Instructional texts for specific target groups should ideally take into ...
research
04/04/2020

Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations for Cyberbullying Classification

Cyberbullying is a pervasive problem in online communities. To identify ...
research
09/27/2021

Mitigating Racial Biases in Toxic Language Detection with an Equity-Based Ensemble Framework

Recent research has demonstrated how racial biases against users who wri...
research
04/08/2019

Disfluencies and Human Speech Transcription Errors

This paper explores contexts associated with errors in transcrip-tion of...
research
07/14/2023

Mitigating Bias in Conversations: A Hate Speech Classifier and Debiaser with Prompts

Discriminatory language and biases are often present in hate speech duri...

Please sign up or login with your details

Forgot password? Click here to reset