Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

05/01/2020
by   Prasetya Ajie Utama, et al.
0

Models for natural language understanding (NLU) tasks often rely on the idiosyncratic biases of the dataset, which make them brittle against test cases outside the training distribution. Recently, several proposed debiasing methods are shown to be very effective in improving out-of-distribution performance. However, their improvements come at the expense of performance drop when models are evaluated on the in-distribution data, which contain examples with higher diversity. This seemingly inevitable trade-off may not tell us much about the changes in the reasoning and understanding capabilities of the resulting models on broader types of examples beyond the small subset represented in the out-of-distribution data. In this paper, we address this trade-off by introducing a novel debiasing method, called confidence regularization, which discourage models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples. We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets (e.g., 7pp gain on HANS dataset) while maintaining the original in-distribution accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2021

Statistically Profiling Biases in Natural Language Reasoning Datasets and Models

Recent work has indicated that many natural language understanding and r...
research
09/05/2021

End-to-End Self-Debiasing Framework for Robust NLU Training

Existing Natural Language Understanding (NLU) models have been shown to ...
research
02/10/2020

Adversarial Filters of Dataset Biases

Large neural models have demonstrated human-level performance on languag...
research
09/13/2019

simple but effective techniques to reduce biases

There have been several studies recently showing that strong natural lan...
research
09/25/2020

Towards Debiasing NLU Models from Unknown Biases

NLU models often exploit biases to achieve high dataset-specific perform...
research
03/03/2023

Diagnosing Model Performance Under Distribution Shift

Prediction models can perform poorly when deployed to target distributio...
research
04/10/2021

NLI Data Sanity Check: Assessing the Effect of Data Corruption on Model Performance

Pre-trained neural language models give high performance on natural lang...

Please sign up or login with your details

Forgot password? Click here to reset