DeepAI AI Chat
Log In Sign Up

Dependency Structure Misspecification in Multi-Source Weak Supervision Models

by   Salva Rühling Cachay, et al.

Data programming (DP) has proven to be an attractive alternative to costly hand-labeling of data. In DP, users encode domain knowledge into labeling functions (LF), heuristics that label a subset of the data noisily and may have complex dependencies. A label model is then fit to the LFs to produce an estimate of the unknown class label. The effects of label model misspecification on test set performance of a downstream classifier are understudied. This presents a serious awareness gap to practitioners, in particular since the dependency structure among LFs is frequently ignored in field applications of DP. We analyse modeling errors due to structure over-specification. We derive novel theoretical bounds on the modeling error and empirically show that this error can be substantial, even when modeling a seemingly sensible structure.


page 1

page 2

page 3

page 4


The Word is Mightier than the Label: Learning without Pointillistic Labels using Data Programming

Most advanced supervised Machine Learning (ML) models rely on vast amoun...

Label Augmentation with Reinforced Labeling for Weak Supervision

Weak supervision (WS) is an alternative to the traditional supervised le...

Few-Shot Sequence Labeling with Label Dependency Transfer

Few-shot sequence labeling faces a unique challenge compared with the ot...

Parsing Thai Social Data: A New Challenge for Thai NLP

Dependency parsing (DP) is a task that analyzes text for syntactic struc...

TagRuler: Interactive Tool for Span-Level Data Programming by Demonstration

Despite rapid developments in the field of machine learning research, co...

Generalizing DP-SGD with Shuffling and Batching Clipping

Classical differential private DP-SGD implements individual clipping wit...

Dynamic Maintenance of Monotone Dynamic Programs and Applications

Dynamic programming (DP) is one of the fundamental paradigms in algorith...