Combining Feature and Instance Attribution to Detect Artifacts

07/01/2021
by   Pouya Pezeshkpour, et al.
0

Training the large deep neural networks that dominate NLP requires large datasets. Many of these are collected automatically or via crowdsourcing, and may exhibit systematic biases or annotation artifacts. By the latter, we mean correlations between inputs and outputs that are spurious, insofar as they do not represent a generally held causal relationship between features and classes; models that exploit such correlations may appear to perform a given task well, but fail on out of sample data. In this paper we propose methods to facilitate identification of training data artifacts, using new hybrid approaches that combine saliency maps (which highlight important input features) with instance attribution methods (which retrieve training samples influential to a given prediction). We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data, and use it to identify previously unreported artifacts in a few standard NLP datasets. We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice, with promising results. We make code for all methods and experiments in this paper available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/09/2021

An Empirical Comparison of Instance Attribution Methods for NLP

Widespread adoption of deep models has motivated a pressing need for app...
research
02/09/2023

Augmenting NLP data to counter Annotation Artifacts for NLI Tasks

In this paper, we explore Annotation Artifacts - the phenomena wherein l...
research
03/23/2022

An Empirical Study of Memorization in NLP

A recent study by Feldman (2020) proposed a long-tail theory to explain ...
research
05/28/2021

Changing the World by Changing the Data

NLP community is currently investing a lot more research and resources i...
research
06/15/2023

Evaluating Data Attribution for Text-to-Image Models

While large text-to-image models are able to synthesize "novel" images, ...
research
05/30/2022

CHALLENGER: Training with Attribution Maps

We show that utilizing attribution maps for training neural networks can...
research
04/17/2021

Competency Problems: On Finding and Removing Artifacts in Language Data

Much recent work in NLP has documented dataset artifacts, bias, and spur...

Please sign up or login with your details

Forgot password? Click here to reset