End-to-End Weak Supervision

07/05/2021
by   Salva Rühling Cachay, et al.
0

Aggregating multiple sources of weak supervision (WS) can ease the data-labeling bottleneck prevalent in many machine learning applications, by replacing the tedious manual collection of ground truth labels. Current state of the art approaches that do not use any labeled training data, however, require two separate modeling steps: Learning a probabilistic latent variable model based on the WS sources – making assumptions that rarely hold in practice – followed by downstream model training. Importantly, the first step of modeling does not consider the performance of the downstream model. To address these caveats we propose an end-to-end approach for directly learning the downstream model by maximizing its agreement with probabilistic labels generated by reparameterizing previous probabilistic posteriors with a neural network. Our results show improved performance over prior work in terms of end model performance on downstream test sets, as well as in terms of improved robustness to dependencies among weak supervision sources.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2020

Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

Weak supervision is a popular method for building machine learning model...
research
06/07/2020

Deep Goal-Oriented Clustering

Clustering and prediction are two primary tasks in the fields of unsuper...
research
10/25/2016

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

A challenge in training discriminative models like neural networks is ob...
research
05/25/2022

Understanding Programmatic Weak Supervision via Source-aware Influence Function

Programmatic Weak Supervision (PWS) aggregates the source votes of multi...
research
03/14/2019

Learning Dependency Structures for Weak Supervision Models

Labeling training data is a key bottleneck in the modern machine learnin...
research
02/11/2022

A Survey on Programmatic Weak Supervision

Labeling training data has become one of the major roadblocks to using m...
research
11/28/2017

Snorkel: Rapid Training Data Creation with Weak Supervision

Labeling training data is increasingly the largest bottleneck in deployi...

Please sign up or login with your details

Forgot password? Click here to reset