GeoCoV19: A Dataset of Hundreds of Millions of Multilingual COVID-19 Tweets with Location Information

05/22/2020
by   Umair Qazi, et al.
0

The past several years have witnessed a huge surge in the use of social media platforms during mass convergence events such as health emergencies, natural or human-induced disasters. These non-traditional data sources are becoming vital for disease forecasts and surveillance when preparing for epidemic and pandemic outbreaks. In this paper, we present GeoCoV19, a large-scale Twitter dataset containing more than 524 million multilingual tweets posted over a period of 90 days since February 1, 2020. Moreover, we employ a gazetteer-based approach to infer the geolocation of tweets. We postulate that this large-scale, multilingual, geolocated social media data can empower the research communities to evaluate how societies are collectively coping with this unprecedented global crisis as well as to develop computational methods to address challenges such as identifying fake news, understanding communities' knowledge gaps, building disease forecast and surveillance models, among others.

READ FULL TEXT

page 2

page 7

page 8

page 9

research
06/27/2022

A Multilingual Dataset of COVID-19 Vaccination Attitudes on Twitter

Vaccine hesitancy is considered as one main cause of the stagnant uptake...
research
10/04/2021

TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels

The widespread usage of social networks during mass convergence events, ...
research
01/11/2021

Model Generalization on COVID-19 Fake News Detection

Amid the pandemic COVID-19, the world is facing unprecedented infodemic ...
research
04/09/2021

The Burden of Being a Bridge: Understanding the Role of Multilingual Users during the COVID-19 Pandemic

The outbreak of the COVID-19 pandemic triggers infodemic over online soc...
research
06/09/2020

EPIC: An Epidemics Corpus Of Over 20 Million Relevant Tweets

Since the start of COVID-19, several relevant corpora from various sourc...
research
06/09/2020

EPIC30M: An Epidemics Corpus Of Over 30 Million Relevant Tweets

Since the start of COVID-19, several relevant corpora from various sourc...
research
12/05/2019

EviDense: a Graph-based Method for Finding Unique High-impact Events with Succinct Keyword-based Descriptions

Despite the significant efforts made by the research community in recent...

Please sign up or login with your details

Forgot password? Click here to reset