HaT5: Hate Language Identification using Text-to-Text Transfer Transformer

02/11/2022
by   Sana Sabah Sabry, et al.
0

We investigate the performance of a state-of-the art (SoTA) architecture T5 (available on the SuperGLUE) and compare with it 3 other previous SoTA architectures across 5 different tasks from 2 relatively diverse datasets. The datasets are diverse in terms of the number and types of tasks they have. To improve performance, we augment the training data by using an autoregressive model. We achieve near-SoTA results on a couple of the tasks - macro F1 scores of 81.66 speech and offensive content (HASOC) 2021 dataset, where SoTA are 82.9 83.05 models (Bi-LSTM) makes the predictions it does by using a publicly available algorithm: Integrated Gradient (IG). This is because explainable artificial intelligence (XAI) is essential for earning the trust of users. The main contributions of this work are the implementation method of T5, which is discussed; the data augmentation using a new conversational AI model checkpoint, which brought performance improvements; and the revelation on the shortcomings of HASOC 2021 dataset. It reveals the difficulties of poor data annotation by using a small set of examples where the T5 model made the correct predictions, even when the ground truth of the test set were incorrect (in our opinion). We also provide our model checkpoints on the HuggingFace hub1 to foster transparency.

READ FULL TEXT
research
10/11/2022

T5 for Hate Speech, Augmented Data and Ensemble

We conduct relatively extensive investigations of automatic hate speech ...
research
02/11/2023

A novel approach to generate datasets with XAI ground truth to evaluate image models

With the increased usage of artificial intelligence (AI), it is imperati...
research
09/25/2020

BET: A Backtranslation Approach for Easy Data Augmentation in Transformer-based Paraphrase Identification Context

Newly-introduced deep learning architectures, namely BERT, XLNet, RoBERT...
research
05/31/2021

LIIR at SemEval-2021 task 6: Detection of Persuasion Techniques In Texts and Images using CLIP features

We describe our approach for SemEval-2021 task 6 on detection of persuas...
research
07/25/2023

ForestMonkey: Toolkit for Reasoning with AI-based Defect Detection and Classification Models

Artificial intelligence (AI) reasoning and explainable AI (XAI) tasks ha...

Please sign up or login with your details

Forgot password? Click here to reset