Cost-Sensitive BERT for Generalisable Sentence Classification with Imbalanced Data

03/16/2020
by   Harish Tayyar Madabushi, et al.
0

The automatic identification of propaganda has gained significance in recent years due to technological and social changes in the way news is generated and consumed. That this task can be addressed effectively using BERT, a powerful new architecture which can be fine-tuned for text classification tasks, is not surprising. However, propaganda detection, like other tasks that deal with news documents and other forms of decontextualized social communication (e.g. sentiment analysis), inherently deals with data whose categories are simultaneously imbalanced and dissimilar. We show that BERT, while capable of handling imbalanced classes with no additional data augmentation, does not generalise well when the training and test data are sufficiently dissimilar (as is often the case with news sources, whose topics evolve over time). We show how to address this problem by providing a statistical measure of similarity between datasets and a method of incorporating cost-weighting into BERT when the training and test sets are dissimilar. We test these methods on the Propaganda Techniques Corpus (PTC) and achieve the second-highest score on sentence-level propaganda classification.

READ FULL TEXT
research
06/12/2023

Imbalanced Multi-label Classification for Business-related Text with Moderately Large Label Spaces

In this study, we compared the performance of four different methods for...
research
10/31/2021

FinEAS: Financial Embedding Analysis of Sentiment

We introduce a new language representation model in finance called Finan...
research
08/24/2020

syrapropa at SemEval-2020 Task 11: BERT-based Models Design For Propagandistic Technique and Span Detection

This paper describes the BERT-based models proposed for two subtasks in ...
research
01/10/2022

BERT for Sentiment Analysis: Pre-trained and Fine-Tuned Alternatives

BERT has revolutionized the NLP field by enabling transfer learning with...
research
03/22/2023

Analyzing the Generalizability of Deep Contextualized Language Representations For Text Classification

This study evaluates the robustness of two state-of-the-art deep context...
research
09/05/2023

Leveraging BERT Language Models for Multi-Lingual ESG Issue Identification

Environmental, Social, and Governance (ESG) has been used as a metric to...
research
02/15/2022

BLUE at Memotion 2.0 2022: You have my Image, my Text and my Transformer

Memes are prevalent on the internet and continue to grow and evolve alon...

Please sign up or login with your details

Forgot password? Click here to reset