A Survey on Cost Types, Interaction Schemes, and Annotator Performance Models in Selection Algorithms for Active Learning in Classification

09/23/2021
by   Marek Herde, et al.
5

Pool-based active learning (AL) aims to optimize the annotation process (i.e., labeling) as the acquisition of annotations is often time-consuming and therefore expensive. For this purpose, an AL strategy queries annotations intelligently from annotators to train a high-performance classification model at a low annotation cost. Traditional AL strategies operate in an idealized framework. They assume a single, omniscient annotator who never gets tired and charges uniformly regardless of query difficulty. However, in real-world applications, we often face human annotators, e.g., crowd or in-house workers, who make annotation mistakes and can be reluctant to respond if tired or faced with complex queries. Recently, a wide range of novel AL strategies has been proposed to address these issues. They differ in at least one of the following three central aspects from traditional AL: (1) They explicitly consider (multiple) human annotators whose performances can be affected by various factors, such as missing expertise. (2) They generalize the interaction with human annotators by considering different query and annotation types, such as asking an annotator for feedback on an inferred classification rule. (3) They take more complex cost schemes regarding annotations and misclassifications into account. This survey provides an overview of these AL strategies and refers to them as real-world AL. Therefore, we introduce a general real-world AL strategy as part of a learning cycle and use its elements, e.g., the query and annotator selection algorithm, to categorize about 60 real-world AL strategies. Finally, we outline possible directions for future research in the field of AL.

READ FULL TEXT

page 1

page 10

page 21

page 24

research
01/09/2023

Active Learning for Abstractive Text Summarization

Construction of human-curated annotated datasets for abstractive text su...
research
08/17/2021

ImitAL: Learning Active Learning Strategies from Synthetic Data

One of the biggest challenges that complicates applied supervised machin...
research
10/09/2018

Discovering General-Purpose Active Learning Strategies

We propose a general-purpose approach to discovering active learning (AL...
research
05/24/2023

Active Learning for Natural Language Generation

The field of text generation suffers from a severe shortage of labeled d...
research
08/01/2023

ALE: A Simulation-Based Active Learning Evaluation Framework for the Parameter-Driven Comparison of Query Strategies for NLP

Supervised machine learning and deep learning require a large amount of ...
research
02/02/2018

Measuring Spark on AWS: A Case Study on Mining Scientific Publications with Annotation Query

Annotation Query (AQ) is a program that provides the ability to query ma...
research
06/19/2023

Perturbation-Based Two-Stage Multi-Domain Active Learning

In multi-domain learning (MDL) scenarios, high labeling effort is requir...

Please sign up or login with your details

Forgot password? Click here to reset