WRENCH: A Comprehensive Benchmark for Weak Supervision

by   Jieyu Zhang, et al.
Georgia Institute of Technology
University of Washington

Recent Weak Supervision (WS) approaches have had widespread success in easing the bottleneck of labeling training data for machine learning by synthesizing labels from multiple potentially noisy supervision sources. However, proper measurement and analysis of these approaches remain a challenge. First, datasets used in existing works are often private and/or custom, limiting standardization. Second, WS datasets with the same name and base data often vary in terms of the labels and weak supervision sources used, a significant "hidden" source of evaluation variance. Finally, WS studies often diverge in terms of the evaluation protocol and ablations used. To address these problems, we introduce a benchmark platform, , for a thorough and standardized evaluation of WS approaches. It consists of 22 varied real-world datasets for classification and sequence tagging; a range of real, synthetic, and procedurally-generated weak supervision sources; and a modular, extensible framework for WS evaluation, including implementations for popular WS methods. We use to conduct extensive comparisons over more than 100 method variants to demonstrate its efficacy as a benchmark platform. The code is available at <https://github.com/JieyuZ2/wrench>.


page 1

page 2

page 3

page 4


A Survey on Programmatic Weak Supervision

Labeling training data has become one of the major roadblocks to using m...

WALNUT: A Benchmark on Weakly Supervised Learning for Natural Language Understanding

Building quality machine learning models for natural language understand...

Learning Dependency Structures for Weak Supervision Models

Labeling training data is a key bottleneck in the modern machine learnin...

Alfred: A System for Prompted Weak Supervision

Alfred is the first system for programmatic weak supervision (PWS) that ...

pyKT: A Python Library to Benchmark Deep Learning based Knowledge Tracing Models

Knowledge tracing (KT) is the task of using students' historical learnin...

Binary Classification with Positive Labeling Sources

To create a large amount of training labels for machine learning models ...

Understanding Programmatic Weak Supervision via Source-aware Influence Function

Programmatic Weak Supervision (PWS) aggregates the source votes of multi...

Please sign up or login with your details

Forgot password? Click here to reset