Active Learning for Abstractive Text Summarization

01/09/2023
by   Akim Tsvigun, et al.
0

Construction of human-curated annotated datasets for abstractive text summarization (ATS) is very time-consuming and expensive because creating each instance requires a human annotator to read a long document and compose a shorter summary that would preserve the key information relayed by the original document. Active Learning (AL) is a technique developed to reduce the amount of annotation required to achieve a certain level of machine learning model performance. In information extraction and text classification, AL can reduce the amount of labor up to multiple times. Despite its potential for aiding expensive annotation, as far as we know, there were no effective AL query strategies for ATS. This stems from the fact that many AL strategies rely on uncertainty estimation, while as we show in our work, uncertain instances are usually noisy, and selecting them can degrade the model performance compared to passive annotation. We address this problem by proposing the first effective query strategy for AL in ATS based on diversity principles. We show that given a certain annotation budget, using our strategy in AL annotation helps to improve the model performance in terms of ROUGE and consistency scores. Additionally, we analyze the effect of self-learning and show that it can further increase the performance of the model.

READ FULL TEXT

page 4

page 6

page 15

page 18

page 21

page 22

page 23

page 24

research
05/24/2023

Active Learning for Natural Language Generation

The field of text generation suffers from a severe shortage of labeled d...
research
09/23/2021

A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification

Pool-based active learning (AL) aims to optimize the annotation process ...
research
04/26/2017

On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow

Abundant data is the key to successful machine learning. However, superv...
research
08/01/2023

ALE: A Simulation-Based Active Learning Evaluation Framework for the Parameter-Driven Comparison of Query Strategies for NLP

Supervised machine learning and deep learning require a large amount of ...
research
08/17/2020

A Survey of Active Learning for Text Classification using Deep Neural Networks

Natural language processing (NLP) and neural networks (NNs) have both un...
research
06/02/2023

Beyond Active Learning: Leveraging the Full Potential of Human Interaction via Auto-Labeling, Human Correction, and Human Verification

Active Learning (AL) is a human-in-the-loop framework to interactively a...
research
12/26/2021

Budget Sensitive Reannotation of Noisy Relation Classification Data Using Label Hierarchy

Large crowd-sourced datasets are often noisy and relation classification...

Please sign up or login with your details

Forgot password? Click here to reset