Identifying Incorrect Annotations in Multi-Label Classification Data

11/25/2022
by   Aditya Thyagarajan, et al.
0

In multi-label classification, each example in a dataset may be annotated as belonging to one or more classes (or none of the classes). Example applications include image (or document) tagging where each possible tag either applies to a particular image (or document) or not. With many possible classes to consider, data annotators are likely to make errors when labeling such data in practice. Here we consider algorithms for finding mislabeled examples in multi-label classification datasets. We propose an extension of the Confident Learning framework to this setting, as well as a label quality score that ranks examples with label errors much higher than those which are correctly labeled. Both approaches can utilize any trained classifier. After demonstrating that our methodology empirically outperforms other algorithms for label error detection, we apply our approach to discover many label errors in the CelebA image tagging dataset.

READ FULL TEXT

page 2

page 10

research
05/24/2023

Understanding Label Bias in Single Positive Multi-Label Learning

Annotating data for multi-label classification is prohibitively expensiv...
research
08/26/2020

Item Tagging for Information Retrieval: A Tripartite Graph Neural Network based Approach

Tagging has been recognized as a successful practice to boost relevance ...
research
06/21/2023

On Evaluation of Document Classification using RVL-CDIP

The RVL-CDIP benchmark is widely used for measuring performance on the t...
research
05/02/2023

Taxonomizing and Measuring Representational Harms: A Look at Image Tagging

In this paper, we examine computational approaches for measuring the "fa...
research
10/13/2022

Utilizing supervised models to infer consensus labels and their quality from data with multiple annotators

Real-world data for classification is often labeled by multiple annotato...
research
11/27/2019

Multi-label Classification for Automatic Tag Prediction in the Context of Programming Challenges

One of the best ways for developers to test and improve their skills in ...
research
03/26/2021

Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks

We algorithmically identify label errors in the test sets of 10 of the m...

Please sign up or login with your details

Forgot password? Click here to reset