Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual

08/28/2019
by   He He, et al.
0

Statistical natural language inference (NLI) models are susceptible to learning dataset bias: superficial cues that happen to associate with the label on a particular dataset, but are not useful in general, e.g., negation words indicate contradiction. As exposed by several recent challenge datasets, these models perform poorly when such association is absent, e.g., predicting that "I love dogs" contradicts "I don't love cats". Our goal is to design learning algorithms that guard against known dataset bias. We formalize the concept of dataset bias under the framework of distribution shift and present a simple debiasing algorithm based on residual fitting, which we call DRiFt. We first learn a biased model that only uses features that are known to relate to dataset bias. Then, we train a debiased model that fits to the residual of the biased model, focusing on examples that cannot be predicted well by biased features only. We use DRiFt to train three high-performing NLI models on two benchmark datasets, SNLI and MNLI. Our debiased models achieve significant gains over baseline models on two challenge test sets, while maintaining reasonable performance on the original test sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2022

Feature-Level Debiased Natural Language Understanding

Natural language understanding (NLU) models often rely on dataset biases...
research
05/10/2020

Towards Robustifying NLI Models Against Lexical Dataset Biases

While deep learning models are making fast progress on the task of Natur...
research
05/06/2023

Echoes: Unsupervised Debiasing via Pseudo-bias Labeling in an Echo Chamber

Neural networks often learn spurious correlations when exposed to biased...
research
08/31/2021

A Generative Approach for Mitigating Structural Biases in Natural Language Inference

Many natural language inference (NLI) datasets contain biases that allow...
research
12/16/2021

Automatically Identifying Semantic Bias in Crowdsourced Natural Language Inference Datasets

Natural language inference (NLI) is an important task for producing usef...
research
05/15/2019

Selection Bias Explorations and Debias Methods for Natural Language Sentence Matching Datasets

Natural Language Sentence Matching (NLSM) has gained substantial attenti...
research
11/07/2022

Looking at the Overlooked: An Analysis on the Word-Overlap Bias in Natural Language Inference

It has been shown that NLI models are usually biased with respect to the...

Please sign up or login with your details

Forgot password? Click here to reset