ADBCMM : Acronym Disambiguation by Building Counterfactuals and Multilingual Mixing

12/08/2021
by   Yixuan Weng, et al.
0

Scientific documents often contain a large number of acronyms. Disambiguation of these acronyms will help researchers better understand the meaning of vocabulary in the documents. In the past, thanks to large amounts of data from English literature, acronym task was mainly applied in English literature. However, for other low-resource languages, this task is difficult to obtain good performance and receives less attention due to the lack of large amount of annotation data. To address the above issue, this paper proposes an new method for acronym disambiguation, named as ADBCMM, which can significantly improve the performance of low-resource languages by building counterfactuals and multilingual mixing. Specifically, by balancing data bias in low-resource langauge, ADBCMM will able to improve the test performance outside the data set. In SDU@AAAI-22 - Shared Task 2: Acronym Disambiguation, the proposed method won first place in French and Spanish. You can repeat our results here https://github.com/WENGSYX/ADBCMM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/18/2021

Cascading Adaptors to Leverage English Data to Improve Performance of Question Answering for Low-Resource Languages

Transformer based architectures have shown notable results on many down ...
research
04/14/2020

Deep Learning Models for Multilingual Hate Speech Detection

Hate speech detection is a challenging problem with most of the datasets...
research
08/04/2023

Sinhala-English Parallel Word Dictionary Dataset

Parallel datasets are vital for performing and evaluating any kind of mu...
research
06/12/2021

Exploiting Parallel Corpora to Improve Multilingual Embedding based Document and Sentence Alignment

Multilingual sentence representations pose a great advantage for low-res...
research
06/14/2023

Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations

Vision-and-language (VL) models with separate encoders for each modality...
research
10/26/2022

Modeling the Graphotactics of Low-Resource Languages Using Sequential GANs

Generative Adversarial Networks (GANs) have been shown to aid in the cre...
research
11/04/2019

A Novel Approach to Enhance the Performance of Semantic Search in Bengali using Neural Net and other Classification Techniques

Search has for a long time been an important tool for users to retrieve ...

Please sign up or login with your details

Forgot password? Click here to reset