Two-Stage Classifier for COVID-19 Misinformation Detection Using BERT: a Study on Indonesian Tweets

by   Douglas Raevan Faisal, et al.

The COVID-19 pandemic has caused globally significant impacts since the beginning of 2020. This brought a lot of confusion to society, especially due to the spread of misinformation through social media. Although there were already several studies related to the detection of misinformation in social media data, most studies focused on the English dataset. Research on COVID-19 misinformation detection in Indonesia is still scarce. Therefore, through this research, we collect and annotate datasets for Indonesian and build prediction models for detecting COVID-19 misinformation by considering the tweet's relevance. The dataset construction is carried out by a team of annotators who labeled the relevance and misinformation of the tweet data. In this study, we propose the two-stage classifier model using IndoBERT pre-trained language model for the Tweet misinformation detection task. We also experiment with several other baseline models for text classification. The experimental results show that the combination of the BERT sequence classifier for relevance prediction and Bi-LSTM for misinformation detection outperformed other machine learning models with an accuracy of 87.02 contributes to the higher performance of most prediction models. We release a high-quality COVID-19 misinformation Tweet corpus in the Indonesian language, indicated by the high inter-annotator agreement.


page 1

page 2

page 3

page 4


The COVMis-Stance dataset: Stance Detection on Twitter for COVID-19 Misinformation

During the COVID-19 pandemic, large amounts of COVID-19 misinformation a...

Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for Offensive Language Detection

Nowadays, offensive content in social media has become a serious problem...

Detecting the Presence of COVID-19 Vaccination Hesitancy from South African Twitter Data Using Machine Learning

Very few social media studies have been done on South African user-gener...

Adaptation of domain-specific transformer models with text oversampling for sentiment analysis of social media posts on Covid-19 vaccines

Covid-19 has spread across the world and several vaccines have been deve...

Independent Component Analysis for Trustworthy Cyberspace during High Impact Events: An Application to Covid-19

Social media has become an important communication channel during high i...

How would Stance Detection Techniques Evolve after the Launch of ChatGPT?

Stance detection refers to the task of extracting the standpoint (Favor,...

Combat COVID-19 Infodemic Using Explainable Natural Language Processing Models

Misinformation of COVID-19 is prevalent on social media as the pandemic ...

Please sign up or login with your details

Forgot password? Click here to reset