Standardizing and Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing

04/14/2020
by   Firoj Alam, et al.
1

Time-critical analysis of social media streams is important for humanitarian organizations to plan rapid response during disasters. The crisis informatics research community has developed several techniques and systems to process and classify big crisis data on social media. However, due to a variety of different datasets used in the literature, it is not possible to compare the results and to measure the progress made towards better models for crisis classification. In this work, we attempt to bridge this gap by providing a standard crisis-related dataset. We consolidate labels of 8 annotated data sources and provide 166.1k and 141.5k tweets for informativeness and humanitarian classification tasks. The consolidation also result in larger dataset size which is helpful in training stronger models. We also provide baseline results using CNN and BERT models. We make the dataset available at https://crisisnlp.qcri.org/crisis_datasets_benchmarks.html

READ FULL TEXT

page 5

page 6

page 7

research
04/07/2021

HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks

Social networks are widely used for information consumption and dissemin...
research
04/09/2022

Benchmarking for Public Health Surveillance tasks on Social Media with a Domain-Specific Pretrained Language Model

A user-generated text on social media enables health workers to keep tra...
research
11/26/2015

Hierarchical classification of e-commerce related social media

In this paper, we attempt to classify tweets into root categories of the...
research
04/22/2023

Understanding Lexical Biases when Identifying Gang-related Social Media Communications

Individuals involved in gang-related activity use mainstream social medi...
research
02/26/2021

Multi-task transfer learning for finding actionable information from crisis-related messages on social media

The Incident streams (IS) track is a research challenge aimed at finding...
research
04/24/2018

Floods impact dynamics quantified from big data sources

Natural disasters affect hundreds of millions of people worldwide every ...
research
02/23/2022

MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset

Misinformation is becoming increasingly prevalent on social media and in...

Please sign up or login with your details

Forgot password? Click here to reset