LINDA: Unsupervised Learning to Interpolate in Natural Language Processing

12/28/2021
by   Yekyung Kim, et al.
11

Despite the success of mixup in data augmentation, its applicability to natural language processing (NLP) tasks has been limited due to the discrete and variable-length nature of natural languages. Recent studies have thus relied on domain-specific heuristics and manually crafted resources, such as dictionaries, in order to apply mixup in NLP. In this paper, we instead propose an unsupervised learning approach to text interpolation for the purpose of data augmentation, to which we refer as "Learning to INterpolate for Data Augmentation" (LINDA), that does not require any heuristics nor manually crafted resources but learns to interpolate between any pair of natural language sentences over a natural language manifold. After empirically demonstrating the LINDA's interpolation capability, we show that LINDA indeed allows us to seamlessly apply mixup in NLP and leads to better generalization in text classification both in-domain and out-of-domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/02/2021

Substructure Substitution: Structured Data Augmentation for NLP

We study a family of data augmentation methods, substructure substitutio...
research
05/22/2019

Augmenting Data with Mixup for Sentence Classification: An Empirical Study

Mixup, a recent proposed data augmentation method through linearly inter...
research
12/08/2020

Discovering key topics from short, real-world medical inquiries via natural language processing and unsupervised learning

Millions of unsolicited medical inquiries are received by pharmaceutical...
research
04/10/2020

Joint translation and unit conversion for end-to-end localization

A variety of natural language tasks require processing of textual data w...
research
09/27/2019

Automatically Learning Data Augmentation Policies for Dialogue Tasks

Automatic data augmentation (AutoAugment) (Cubuk et al., 2019) searches ...
research
02/19/2016

Learning to SMILE(S)

This paper shows how one can directly apply natural language processing ...
research
08/14/2018

R-grams: Unsupervised Learning of Semantic Units in Natural Language

This paper introduces a novel type of data-driven segmented unit that we...

Please sign up or login with your details

Forgot password? Click here to reset