How Different Is Stereotypical Bias Across Languages?

07/14/2023
by   Ibrahim Tolga Öztürk, et al.
0

Recent studies have demonstrated how to assess the stereotypical bias in pre-trained English language models. In this work, we extend this branch of research in multiple different dimensions by systematically investigating (a) mono- and multilingual models of (b) different underlying architectures with respect to their bias in (c) multiple different languages. To that end, we make use of the English StereoSet data set (Nadeem et al., 2021), which we semi-automatically translate into German, French, Spanish, and Turkish. We find that it is of major importance to conduct this type of analysis in a multilingual setting, as our experiments show a much more nuanced picture as well as notable differences from the English-only analysis. The main takeaways from our analysis are that mGPT-2 (partly) shows surprising anti-stereotypical behavior across languages, English (monolingual) models exhibit the strongest bias, and the stereotypes reflected in the data set are least present in Turkish models. Finally, we release our codebase alongside the translated data sets and practical guidelines for the semi-automatic translation to encourage a further extension of our work to other languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2021

Re-Evaluating GermEval17 Using German Pre-Trained Language Models

The lack of a commonly used benchmark data set (collection) such as (Sup...
research
02/22/2018

LIDIOMS: A Multilingual Linked Idioms Data Set

In this paper, we describe the LIDIOMS data set, a multilingual RDF repr...
research
05/31/2023

Automatic Discrimination of Human and Neural Machine Translation in Multilingual Scenarios

We tackle the task of automatically discriminating between human and mac...
research
11/14/2022

Speaking Multiple Languages Affects the Moral Bias of Language Models

Pre-trained multilingual language models (PMLMs) are commonly used when ...
research
05/29/2023

BigTrans: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages

Large language models (LLMs) demonstrate promising translation performan...
research
09/13/2021

Mitigating Language-Dependent Ethnic Bias in BERT

BERT and other large-scale language models (LMs) contain gender and raci...
research
09/20/2021

Model Bias in NLP – Application to Hate Speech Classification

This document sums up our results forthe NLP lecture at ETH in the sprin...

Please sign up or login with your details

Forgot password? Click here to reset