DeepAI AI Chat
Log In Sign Up

End-to-End Weak Supervision

by   Salva Rühling Cachay, et al.

Aggregating multiple sources of weak supervision (WS) can ease the data-labeling bottleneck prevalent in many machine learning applications, by replacing the tedious manual collection of ground truth labels. Current state of the art approaches that do not use any labeled training data, however, require two separate modeling steps: Learning a probabilistic latent variable model based on the WS sources – making assumptions that rarely hold in practice – followed by downstream model training. Importantly, the first step of modeling does not consider the performance of the downstream model. To address these caveats we propose an end-to-end approach for directly learning the downstream model by maximizing its agreement with probabilistic labels generated by reparameterizing previous probabilistic posteriors with a neural network. Our results show improved performance over prior work in terms of end model performance on downstream test sets, as well as in terms of improved robustness to dependencies among weak supervision sources.


page 1

page 2

page 3

page 4


Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

Weak supervision is a popular method for building machine learning model...

Deep Goal-Oriented Clustering

Clustering and prediction are two primary tasks in the fields of unsuper...

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

A challenge in training discriminative models like neural networks is ob...

Understanding Programmatic Weak Supervision via Source-aware Influence Function

Programmatic Weak Supervision (PWS) aggregates the source votes of multi...

Learning Dependency Structures for Weak Supervision Models

Labeling training data is a key bottleneck in the modern machine learnin...

A Survey on Programmatic Weak Supervision

Labeling training data has become one of the major roadblocks to using m...

Snorkel: Rapid Training Data Creation with Weak Supervision

Labeling training data is increasingly the largest bottleneck in deployi...

Code Repositories


Weakly Supervised End-to-End Learning

view repo