Bayesian Ensembles of Crowds and Deep Learners for Sequence Tagging

11/02/2018
by   Edwin Simpson, et al.
0

Current methods for sequence tagging, a core task in NLP, are data hungry. Crowdsourcing is a relatively cheap way to obtain labeled data, but the annotators are unreliable, so redundant labeling and aggregation techniques are required. We evaluate multiple models of annotator reliability and develop a Bayesian method for aggregating sequence labels from multiple annotators. Typically, the process of data collection, aggregation and training a sequence tagger is a pipeline of discrete steps. We integrate these steps by training black-box sequence taggers as components in the aggregation model and accounting for their unreliability. We evaluate our model on named entity recognition and information extraction tasks, showing that our method outperforms previous methods, particularly in small data scenarios that are encountered at the beginning of a crowdsourcing process. Our code is published to encourage adaptation and reuse.

READ FULL TEXT
research
10/25/2022

Influence Functions for Sequence Tagging Models

Many language tasks (e.g., Named Entity Recognition, Part-of-Speech tagg...
research
09/09/2021

Truth Discovery in Sequence Labels from Crowds

Annotations quality and quantity positively affect the performance of se...
research
06/02/2020

Embeddings of Label Components for Sequence Labeling: A Case Study of Fine-grained Named Entity Recognition

In general, the labels used in sequence labeling consist of different ty...
research
04/29/2017

Semi-supervised sequence tagging with bidirectional language models

Pre-trained word embeddings learned from unlabeled text have become a st...
research
08/21/2017

Scientific Information Extraction with Semi-supervised Neural Tagging

This paper addresses the problem of extracting keyphrases from scientifi...
research
01/24/2020

MT-BioNER: Multi-task Learning for Biomedical Named Entity Recognition using Deep Bidirectional Transformers

Conversational agents such as Cortana, Alexa and Siri are continuously w...
research
06/01/2021

Discontinuous Named Entity Recognition as Maximal Clique Discovery

Named entity recognition (NER) remains challenging when entity mentions ...

Please sign up or login with your details

Forgot password? Click here to reset