Towards Lexical Gender Inference: A Scalable Methodology using Online Databases

06/28/2022
by   Marion Bartl, et al.
0

This paper presents a new method for automatically detecting words with lexical gender in large-scale language datasets. Currently, the evaluation of gender bias in natural language processing relies on manually compiled lexicons of gendered expressions, such as pronouns ('he', 'she', etc.) and nouns with lexical gender ('mother', 'boyfriend', 'policewoman', etc.). However, manual compilation of such lists can lead to static information if they are not periodically updated and often involve value judgments by individual annotators and researchers. Moreover, terms not included in the list fall out of the range of analysis. To address these issues, we devised a scalable, dictionary-based method to automatically detect lexical gender that can provide a dynamic, up-to-date analysis with high coverage. Our approach reaches over 80 in determining the lexical gender of nouns retrieved randomly from a Wikipedia sample and when testing on a list of gendered words used in previous research.

READ FULL TEXT
research
05/15/2020

Uncovering Gender Bias in Media Coverage of Politicians with Machine Learning

This paper presents research uncovering systematic gender bias in the re...
research
10/29/2019

Quantifying the Semantic Core of Gender Systems

Many of the world's languages employ grammatical gender on the lexeme. F...
research
07/01/2015

Prior Polarity Lexical Resources for the Italian Language

In this paper we present SABRINA (Sentiment Analysis: a Broad Resource f...
research
04/06/2020

Building a Norwegian Lexical Resource for Medical Entity Recognition

We present a large Norwegian lexical resource of categorized medical ter...
research
04/26/2020

Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias

Common methods for interpreting neural models in natural language proces...
research
06/03/2021

A diachronic evaluation of gender asymmetry in euphemism

The use of euphemisms is a known driver of language change. It has been ...
research
11/30/2016

Deep encoding of etymological information in TEI

This paper aims to provide a comprehensive modeling and representation o...

Please sign up or login with your details

Forgot password? Click here to reset