Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and Models

by   Chaitanya Malaviya, et al.

Cognitive psychologists have documented that humans use cognitive heuristics, or mental shortcuts, to make quick decisions while expending less effort. While performing annotation work on crowdsourcing platforms, we hypothesize that such heuristic use among annotators cascades on to data quality and model robustness. In this work, we study cognitive heuristic use in the context of annotating multiple-choice reading comprehension datasets. We propose tracking annotator heuristic traces, where we tangibly measure low-effort annotation strategies that could indicate usage of various cognitive heuristics. We find evidence that annotators might be using multiple such heuristics, based on correlations with a battery of psychological tests. Importantly, heuristic use among annotators determines data quality along several dimensions: (1) known biased models, such as partial input models, more easily solve examples authored by annotators that rate highly on heuristic use, (2) models trained on annotators scoring highly on heuristic use don't generalize as well, and (3) heuristic-seeking annotators tend to create qualitatively less challenging examples. Our findings suggest that tracking heuristic usage among annotators can potentially help with collecting challenging datasets and diagnosing model biases.


page 5

page 7

page 14

page 16


Beat the AI: Investigating Adversarial Human Annotations for Reading Comprehension

Innovations in annotation methodology have been a propellant for Reading...

Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

Machine learning systems can often achieve high performance on a test se...

Heuristics as conceptual lens for understanding and studying the usage of bibliometrics in research evaluation

While bibliometrics is widely used for research evaluation purposes, a c...

Comparing Humans and Models on a Similar Scale: Towards Cognitive Gender Bias Evaluation in Coreference Resolution

Spurious correlations were found to be an important factor explaining mo...

Do Large Language Models Show Decision Heuristics Similar to Humans? A Case Study Using GPT-3.5

A Large Language Model (LLM) is an artificial intelligence system that h...

On the Trade-off Between Consistency and Coverage in Multi-label Rule Learning Heuristics

Recently, several authors have advocated the use of rule learning algori...

Collecting high-quality adversarial data for machine reading comprehension tasks with humans and models in the loop

We present our experience as annotators in the creation of high-quality,...

Please sign up or login with your details

Forgot password? Click here to reset