Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method

by   Bernadeta Griciūtė, et al.

Topic Modelling (TM) is from the research branches of natural language understanding (NLU) and natural language processing (NLP) that is to facilitate insightful analysis from large documents and datasets, such as a summarisation of main topics and the topic changes. This kind of discovery is getting more popular in real-life applications due to its impact on big data analytics. In this study, from the social-media and healthcare domain, we apply popular Latent Dirichlet Allocation (LDA) methods to model the topic changes in Swedish newspaper articles about Coronavirus. We describe the corpus we created including 6515 articles, methods applied, and statistics on topic changes over approximately 1 year and two months period of time from 17th January 2020 to 13th March 2021. We hope this work can be an asset for grounding applications of topic modelling and can be inspiring for similar case studies in an era with pandemics, to support socio-economic impact research as well as clinical and healthcare analytics. Our data and source code are openly available at https://github. com/poethan/Swed_Covid_TM Keywords: Latent Dirichlet Allocation (LDA); Topic Modelling; Coronavirus; Pandemics; Natural Language Understanding


page 6

page 7


n-stage Latent Dirichlet Allocation: A Novel Approach for LDA

Nowadays, data analysis has become a problem as the amount of data is co...

Topic Modelling and Event Identification from Twitter Textual Data

The tremendous growth of social media content on the Internet has inspir...

Hierarchical Dirichlet process for tracking complex topical structure evolution and its application to autism research literature

In this paper we describe a novel framework for the discovery of the top...

Latent Dirichlet Allocation with Residual Convolutional Neural Network Applied in Evaluating Credibility of Chinese Listed Companies

This project demonstrated a methodology to estimating cooperate credibil...

Sex, drugs, and violence

Automatically detecting inappropriate content can be a difficult NLP tas...

Sentiment Analysis for Measuring Hope and Fear from Reddit Posts During the 2022 Russo-Ukrainian Conflict

This paper proposes a novel lexicon-based unsupervised sentimental analy...

Sparse Parallel Training of Hierarchical Dirichlet Process Topic Models

Nonparametric extensions of topic models such as Latent Dirichlet Alloca...

Please sign up or login with your details

Forgot password? Click here to reset