Active Learning for Natural Language Generation

05/24/2023
by   Yotam Perlitz, et al.
0

The field of text generation suffers from a severe shortage of labeled data due to the extremely expensive and time consuming process involved in manual annotation. A natural approach for coping with this problem is active learning (AL), a well-known machine learning technique for improving annotation efficiency by selectively choosing the most informative examples to label. However, while AL has been well-researched in the context of text classification, its application to text generation remained largely unexplored. In this paper, we present a first systematic study of active learning for text generation, considering a diverse set of tasks and multiple leading AL strategies. Our results indicate that existing AL strategies, despite their success in classification, are largely ineffective for the text generation scenario, and fail to consistently surpass the baseline of random example selection. We highlight some notable differences between the classification and generation scenarios, and analyze the selection behaviors of existing AL strategies. Our findings motivate exploring novel approaches for applying AL to NLG tasks.

READ FULL TEXT

page 6

page 12

page 13

research
01/09/2023

Active Learning for Abstractive Text Summarization

Construction of human-curated annotated datasets for abstractive text su...
research
04/12/2021

Active learning for medical code assignment

Machine Learning (ML) is widely used to automatically extract meaningful...
research
03/11/2021

Active^2 Learning: Actively reducing redundancies in Active Learning methods for Sequence Tagging and Machine Translation

While deep learning is a powerful tool for natural language processing (...
research
04/26/2017

On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow

Abundant data is the key to successful machine learning. However, superv...
research
09/23/2021

A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification

Pool-based active learning (AL) aims to optimize the annotation process ...
research
10/03/2018

Active Learning for New Domains in Natural Language Understanding

We explore active learning (AL) utterance selection for improving the ac...
research
07/16/2021

Active learning for online training in imbalanced data streams under cold start

Labeled data is essential in modern systems that rely on Machine Learnin...

Please sign up or login with your details

Forgot password? Click here to reset