MACRONYM: A Large-Scale Dataset for Multilingual and Multi-Domain Acronym Extraction

02/19/2022
by   Amir Pouran Ben Veyseh, et al.
0

Acronym extraction is the task of identifying acronyms and their expanded forms in texts that is necessary for various NLP applications. Despite major progress for this task in recent years, one limitation of existing AE research is that they are limited to the English language and certain domains (i.e., scientific and biomedical). As such, challenges of AE in other languages and domains is mainly unexplored. Lacking annotated datasets in multiple languages and domains has been a major issue to hinder research in this area. To address this limitation, we propose a new dataset for multilingual multi-domain AE. Specifically, 27,200 sentences in 6 typologically different languages and 2 domains, i.e., Legal and Scientific, is manually annotated for AE. Our extensive experiments on the proposed dataset show that AE in different languages and different learning settings has unique challenges, emphasizing the necessity of further research on multilingual and multi-domain AE.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2022

MINION: a Large-Scale and Diverse Dataset for Multilingual Event Detection

Event Detection (ED) is the task of identifying and classifying trigger ...
research
06/30/2022

Domain Adaptive Pretraining for Multilingual Acronym Extraction

This paper presents our findings from participating in the multilingual ...
research
11/11/2022

MEE: A Novel Multilingual Event Extraction Dataset

Event Extraction (EE) is one of the fundamental tasks in Information Ext...
research
05/18/2023

Multilingual Event Extraction from Historical Newspaper Adverts

NLP methods can aid historians in analyzing textual materials in greater...
research
10/28/2020

What Does This Acronym Mean? Introducing a New Dataset for Acronym Identification and Disambiguation

Acronyms are the short forms of phrases that facilitate conveying length...
research
10/10/2022

HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis Response

Timely and effective response to humanitarian crises requires quick and ...
research
07/13/2018

New/s/leak 2.0 - Multilingual Information Extraction and Visualization for Investigative Journalism

Investigative journalism in recent years is confronted with two major ch...

Please sign up or login with your details

Forgot password? Click here to reset