Multilingual Augmenter: The Model Chooses

02/19/2021
by   Matthew Ciolino, et al.
0

Natural Language Processing (NLP) relies heavily on training data. Transformers, as they have gotten bigger, have required massive amounts of training data. To satisfy this requirement, text augmentation should be looked at as a way to expand your current dataset and to generalize your models. One text augmentation we will look at is translation augmentation. We take an English sentence and translate it to another language before translating it back to English. In this paper, we look at the effect of 108 different language back translations on various metrics and text embeddings.

READ FULL TEXT

page 11

page 12

page 13

research
10/07/2020

Improving Sentiment Analysis over non-English Tweets using Multilingual Transformers and Automatic Translation for Data-Augmentation

Tweets are specific text data when compared to general text. Although se...
research
07/04/2020

Text Data Augmentation: Towards better detection of spear-phishing emails

Text data augmentation, i.e. the creation of synthetic textual data from...
research
01/09/2022

An Ensemble Approach to Acronym Extraction using Transformers

Acronyms are abbreviated units of a phrase constructed by using initial ...
research
09/14/2022

vec2text with Round-Trip Translations

We investigate models that can generate arbitrary natural language text ...
research
02/24/2023

STA: Self-controlled Text Augmentation for Improving Text Classifications

Despite recent advancements in Machine Learning, many tasks still involv...
research
08/19/2023

Data-to-text Generation for Severely Under-Resourced Languages with GPT-3.5: A Bit of Help Needed from Google Translate

LLMs like GPT are great at tasks involving English which dominates in th...
research
04/22/2020

DeepSubQE: Quality estimation for subtitle translations

Quality estimation (QE) for tasks involving language data is hard owing ...

Please sign up or login with your details

Forgot password? Click here to reset