Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision

04/20/2020
by   Damien Teney, et al.
2

One of the primary challenges limiting the applicability of deep learning is its susceptibility to learning spurious correlations rather than the underlying mechanisms of the task of interest. The resulting failure to generalise cannot be addressed by simply using more data from the same distribution. We propose an auxiliary training objective that improves the generalization capabilities of neural networks by leveraging an overlooked supervisory signal found in existing datasets. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. We show that such pairs can be identified in a number of existing datasets in computer vision (visual question answering, multi-label image classification) and natural language processing (sentiment analysis, natural language inference). The new training objective orients the gradient of a model's decision function with pairs of counterfactual examples. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.

READ FULL TEXT

page 2

page 11

page 16

page 18

research
05/18/2017

Learning Convolutional Text Representations for Visual Question Answering

Visual question answering is a recently proposed artificial intelligence...
research
03/24/2022

Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets

Natural language processing models often exploit spurious correlations b...
research
09/26/2019

Learning the Difference that Makes a Difference with Counterfactually-Augmented Data

Despite alarm over the reliance of machine learning systems on so-called...
research
08/03/2022

SpanDrop: Simple and Effective Counterfactual Learning for Long Sequences

Distilling supervision signal from a long sequence to make predictions i...
research
02/07/2022

Diversify and Disambiguate: Learning From Underspecified Data

Many datasets are underspecified, which means there are several equally ...
research
10/19/2022

CPL: Counterfactual Prompt Learning for Vision and Language Models

Prompt tuning is a new few-shot transfer learning technique that only tu...
research
09/10/2021

Counterfactual Adversarial Learning with Representation Interpolation

Deep learning models exhibit a preference for statistical fitting over l...

Please sign up or login with your details

Forgot password? Click here to reset