Jury Learning: Integrating Dissenting Voices into Machine Learning Models

02/07/2022
by   Mitchell L. Gordon, et al.
0

Whose labels should a machine learning (ML) algorithm learn to emulate? For ML tasks ranging from online comment toxicity to misinformation detection to medical diagnosis, different groups in society may have irreconcilable disagreements about ground truth labels. Supervised ML today resolves these label disagreements implicitly using majority vote, which overrides minority groups' labels. We introduce jury learning, a supervised ML approach that resolves these disagreements explicitly through the metaphor of a jury: defining which people or groups, in what proportion, determine the classifier's prediction. For example, a jury learning model for online toxicity might centrally feature women and Black jurors, who are commonly targets of online harassment. To enable jury learning, we contribute a deep learning architecture that models every annotator in a dataset, samples from annotators' models to populate the jury, then runs inference to classify. Our architecture enables juries that dynamically adapt their composition, explore counterfactuals, and visualize dissent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2022

Disparate Censorship Undertesting: A Source of Label Bias in Clinical Machine Learning

As machine learning (ML) models gain traction in clinical applications, ...
research
09/30/2020

Uncertainty Estimation For Community Standards Violation In Online Social Networks

Online Social Networks (OSNs) provide a platform for users to share thei...
research
06/22/2023

Online Self-Supervised Learning in Machine Learning Intrusion Detection for the Internet of Things

This paper proposes a novel Self-Supervised Intrusion Detection (SSID) f...
research
12/07/2021

Ground-Truth, Whose Truth? – Examining the Challenges with Annotating Toxic Text Datasets

The use of machine learning (ML)-based language models (LMs) to monitor ...
research
04/03/2023

Online Algorithms for Hierarchical Inference in Deep Learning applications at the Edge

We consider a resource-constrained Edge Device (ED) embedded with a smal...
research
11/23/2022

SeedBERT: Recovering Annotator Rating Distributions from an Aggregated Label

Many machine learning tasks – particularly those in affective computing ...
research
02/03/2023

Augmenting Rule-based DNS Censorship Detection at Scale with Machine Learning

The proliferation of global censorship has led to the development of a p...

Please sign up or login with your details

Forgot password? Click here to reset