Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers

by   Markus Bayer, et al.

In many cases of machine learning, research suggests that the development of training data might have a higher relevance than the choice and modelling of classifiers themselves. Thus, data augmentation methods have been developed to improve classifiers by artificially created training data. In NLP, there is the challenge of establishing universal rules for text transformations which provide new linguistic patterns. In this paper, we present and evaluate a text generation method suitable to increase the performance of classifiers for long and short texts. We achieved promising improvements when evaluating short as well as long text tasks with the enhancement by our text generation method. In a simulated low data regime additive accuracy gains of up to 15.53 achieved. As the current track of these constructed regimes is not universally applicable, we also show major improvements in several real world low data tasks (up to +4.84 F1 score). Since we are evaluating the method from many perspectives, we also observe situations where the method might not be suitable. We discuss implications and patterns for the successful application of our approach on different types of datasets.


page 1

page 2

page 3

page 4


Data Augmentation for Text Generation Without Any Augmented Data

Data augmentation is an effective way to improve the performance of many...

Evaluating the Morphosyntactic Well-formedness of Generated Texts

Text generation systems are ubiquitous in natural language processing ap...

Automatic Conditional Generation of Personalized Social Media Short Texts

Automatic text generation has received much attention owing to rapid dev...

Controlled Text Generation for Data Augmentation in Intelligent Artificial Agents

Data availability is a bottleneck during early stages of development of ...

Not Just Pretty Pictures: Text-to-Image Generators Enable Interpretable Interventions for Robust Representations

Neural image classifiers are known to undergo severe performance degrada...

Neural Data-to-Text Generation with LM-based Text Augmentation

For many new application domains for data-to-text generation, the main o...

STA: Self-controlled Text Augmentation for Improving Text Classifications

Despite recent advancements in Machine Learning, many tasks still involv...

Please sign up or login with your details

Forgot password? Click here to reset