Learning with Noisy Labels by Targeted Relabeling

10/15/2021
by   Derek Chen, et al.
0

Crowdsourcing platforms are often used to collect datasets for training deep neural networks, despite higher levels of inaccurate labeling compared to expert labeling. There are two common strategies to manage the impact of this noise, the first involves aggregating redundant annotations, but comes at the expense of labeling substantially fewer examples. Secondly, prior works have also considered using the entire annotation budget to label as many examples as possible and subsequently apply denoising algorithms to implicitly clean up the dataset. We propose an approach which instead reserves a fraction of annotations to explicitly relabel highly probable labeling errors. In particular, we allocate a large portion of the labeling budget to form an initial dataset used to train a model. This model is then used to identify specific examples that appear most likely to be incorrect, which we spend the remaining budget to relabel. Experiments across three model variations and four natural language processing tasks show our approach outperforms both label aggregation and advanced denoising methods designed to handle noisy labels when allocated the same annotation budget.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2023

ActiveLab: Active Learning with Re-Labeling by Multiple Annotators

In real-world data labeling applications, annotators often provide imper...
research
12/26/2021

Budget Sensitive Reannotation of Noisy Relation Classification Data Using Label Hierarchy

Large crowd-sourced datasets are often noisy and relation classification...
research
09/09/2021

Truth Discovery in Sequence Labels from Crowds

Annotations quality and quantity positively affect the performance of se...
research
12/13/2017

Learning From Noisy Singly-labeled Data

Supervised learning depends on annotated examples, which are taken to be...
research
05/31/2017

Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks

Collecting large training datasets, annotated with high-quality labels, ...
research
02/14/2018

Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise

The growing importance of massive datasets with the advent of deep learn...
research
02/06/2023

Interface Design for Crowdsourcing Hierarchical Multi-Label Text Annotations

Human data labeling is an important and expensive task at the heart of s...

Please sign up or login with your details

Forgot password? Click here to reset