Towards Robustness to Label Noise in Text Classification via Noise Modeling

01/27/2021
by   Siddhant Garg, et al.
0

Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier. We first assign a probability score to each training sample of having a noisy label, through a beta mixture model fitted on the losses at an early epoch of training. Then, we use this score to selectively guide the learning of the noise model and classifier. Our empirical evaluation on two text classification tasks shows that our approach can improve over the baseline accuracy, and prevent over-fitting to the noise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2022

Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification

Incorrect labels in training data occur when human annotators make mista...
research
03/18/2019

An Effective Label Noise Model for DNN Text Classification

Because large, human-annotated datasets suffer from labeling errors, it ...
research
06/03/2022

Task-Adaptive Pre-Training for Boosting Learning With Noisy Labels: A Study on Text Classification for African Languages

For high-resource languages like English, text classification is a well-...
research
08/24/2018

Building a Robust Text Classifier on a Test-Time Budget

We propose a generic and interpretable learning framework for building r...
research
09/11/2018

Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data

Industry datasets used for text classification are rarely created for th...
research
04/25/2019

Unsupervised label noise modeling and loss correction

Despite being robust to small amounts of label noise, convolutional neur...

Please sign up or login with your details

Forgot password? Click here to reset