Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments

The increased proliferation of abusive content on social media platforms has a negative impact on online users. The dread, dislike, discomfort, or mistrust of lesbian, gay, transgender or bisexual persons is defined as homophobia/transphobia. Homophobic/transphobic speech is a type of offensive language that may be summarized as hate speech directed toward LGBT+ people, and it has been a growing concern in recent years. Online homophobia/transphobia is a severe societal problem that can make online platforms poisonous and unwelcome to LGBT+ people while also attempting to eliminate equality, diversity, and inclusion. We provide a new hierarchical taxonomy for online homophobia and transphobia, as well as an expert-labelled dataset that will allow homophobic/transphobic content to be automatically identified. We educated annotators and supplied them with comprehensive annotation rules because this is a sensitive issue, and we previously discovered that untrained crowdsourcing annotators struggle with diagnosing homophobia due to cultural and other prejudices. The dataset comprises 15,141 annotated multilingual comments. This paper describes the process of building the dataset, qualitative analysis of data, and inter-annotator agreement. In addition, we create baseline models for the dataset. To the best of our knowledge, our dataset is the first such dataset created. Warning: This paper contains explicit statements of homophobia, transphobia, stereotypes which may be distressing to some readers.


DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed Text

This paper describes the development of a multilingual, manually annotat...

Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis

Hate speech and toxic comments are a common concern of social media plat...

Hate Speech Dataset from a White Supremacy Forum

Hate speech is commonly defined as any communication that disparages a t...

Developing a Multilingual Annotated Corpus of Misogyny and Aggression

In this paper, we discuss the development of a multilingual annotated co...

SemEval-2023 Task 10: Explainable Detection of Online Sexism

Online sexism is a widespread and harmful phenomenon. Automated tools ca...

ETHOS: an Online Hate Speech Detection Dataset

Online hate speech is a newborn problem in our modern society which is g...

Towards Understanding of Deepfake Videos in the Wild

Deepfakes have become a growing concern in recent years, prompting resea...

Please sign up or login with your details

Forgot password? Click here to reset