XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering

05/27/2019
by   Jasdeep Singh, et al.
2

While natural language processing systems often focus on a single language, multilingual transfer learning has the potential to improve performance, especially for low-resource languages. We introduce XLDA, cross-lingual data augmentation, a method that replaces a segment of the input text with its translation in another language. XLDA enhances performance of all 14 tested languages of the cross-lingual natural language inference (XNLI) benchmark. With improvements of up to 4.8%, training with XLDA achieves state-of-the-art performance for Greek, Turkish, and Urdu. XLDA is in contrast to, and performs markedly better than, a more naive approach that aggregates examples in various languages in a way that each example is solely in one language. On the SQuAD question answering task, we see that XLDA provides a 1.0% performance increase on the English evaluation set. Comprehensive experiments suggest that most languages are effective as cross-lingual augmentors, that XLDA is robust to a wide range of translation quality, and that XLDA is even more effective for randomly initialized models than for pretrained models.

READ FULL TEXT
research
01/16/2023

XNLI 2.0: Improving XNLI dataset and performance on Cross Lingual Understanding (XLU)

Natural Language Processing systems are heavily dependent on the availab...
research
04/28/2020

MultiMix: A Robust Data Augmentation Strategy for Cross-Lingual NLP

Transfer learning has yielded state-of-the-art results in many supervise...
research
04/09/2021

TransWiC at SemEval-2021 Task 2: Transformer-based Multilingual and Cross-lingual Word-in-Context Disambiguation

Identifying whether a word carries the same meaning or different meaning...
research
06/25/2021

ParaLaw Nets – Cross-lingual Sentence-level Pretraining for Legal Text Processing

Ambiguity is a characteristic of natural language, which makes expressio...
research
12/07/2022

JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset

JamPatoisNLI provides the first dataset for natural language inference i...
research
05/30/2022

ZusammenQA: Data Augmentation with Specialized Models for Cross-lingual Open-retrieval Question Answering System

This paper introduces our proposed system for the MIA Shared Task on Cro...
research
05/23/2023

Evaluating and Modeling Attribution for Cross-Lingual Question Answering

Trustworthy answer content is abundant in many high-resource languages a...

Please sign up or login with your details

Forgot password? Click here to reset