Crowdsourcing subjective annotations using pairwise comparisons reduces bias and error compared to the majority-vote method

05/31/2023
by   Hasti Narimanzadeh, et al.
Helsingin yliopisto
aalto
0

How to better reduce measurement variability and bias introduced by subjectivity in crowdsourced labelling remains an open question. We introduce a theoretical framework for understanding how random error and measurement bias enter into crowdsourced annotations of subjective constructs. We then propose a pipeline that combines pairwise comparison labelling with Elo scoring, and demonstrate that it outperforms the ubiquitous majority-voting method in reducing both types of measurement error. To assess the performance of the labelling approaches, we constructed an agent-based model of crowdsourced labelling that lets us introduce different types of subjectivity into the tasks. We find that under most conditions with task subjectivity, the comparison approach produced higher f_1 scores. Further, the comparison approach is less susceptible to inflating bias, which majority voting tends to do. To facilitate applications, we show with simulated and real-world data that the number of required random comparisons for the same classification accuracy scales log-linearly O(N log N) with the number of labelled items. We also implemented the Elo system as an open-source Python package.

READ FULL TEXT
12/15/2021

Dynamic Human Evaluation for Relative Model Comparisons

Collecting human judgements is currently the most reliable evaluation me...
05/19/2023

An Approach to Multiple Comparison Benchmark Evaluations that is Stable Under Manipulation of the Comparate Set

The measurement of progress using benchmarks evaluations is ubiquitous i...
11/30/2020

Person Perception Biases Exposed: Revisiting the First Impressions Dataset

This work revisits the ChaLearn First Impressions database, annotated fo...
02/28/2015

Analysis of Crowdsourced Sampling Strategies for HodgeRank with Sparse Random Graphs

Crowdsourcing platforms are now extensively used for conducting subjecti...
01/25/2015

Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels

The problem of estimating subjective visual properties from image and vi...
03/10/2019

Deep Robust Subjective Visual Property Prediction in Crowdsourcing

The problem of estimating subjective visual properties (SVP) of images (...
04/16/2012

When majority voting fails: Comparing quality assurance methods for noisy human computation environment

Quality assurance remains a key topic in human computation research. Pri...

Please sign up or login with your details

Forgot password? Click here to reset