Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data

07/02/2018
by   Michael A. Hedderich, et al.
0

Manually labeled corpora are expensive to create and often not available for low-resource languages or domains. Automatic labeling approaches are an alternative way to obtain labeled data in a quicker and cheaper way. However, these labels often contain more errors which can deteriorate a classifier's performance when trained on this data. We propose a noise layer that is added to a neural network architecture. This allows modeling the noise and train on a combination of clean and noisy data. We show that in a low-resource NER task we can improve performance by up to 35 handling the noise.

READ FULL TEXT
research
03/28/2019

Handling Noisy Labels for Robustly Learning from Self-Training Data for Low-Resource Sequence Labeling

In this paper, we address the problem of effectively self-training neura...
research
10/14/2019

Feature-Dependent Confusion Matrices for Low-Resource NER Labeling with Noisy Labels

In low-resource settings, the performance of supervised labeling models ...
research
08/26/2019

Low-Resource Name Tagging Learned with Weakly Labeled Data

Name tagging in low-resource languages or domains suffers from inadequat...
research
04/02/2019

Data Augmentation for Context-Sensitive Neural Lemmatization Using Inflection Tables and Raw Text

Lemmatization aims to reduce the sparse data problem by relating the inf...
research
10/17/2022

Transferring Knowledge via Neighborhood-Aware Optimal Transport for Low-Resource Hate Speech Detection

The concerning rise of hateful content on online platforms has increased...
research
07/14/2022

Multilinguals at SemEval-2022 Task 11: Complex NER in Semantically Ambiguous Settings for Low Resource Languages

We leverage pre-trained language models to solve the task of complex NER...
research
06/03/2022

Task-Adaptive Pre-Training for Boosting Learning With Noisy Labels: A Study on Text Classification for African Languages

For high-resource languages like English, text classification is a well-...

Please sign up or login with your details

Forgot password? Click here to reset