How does this interaction affect me? Interpretable attribution for feature interactions

06/19/2020
by   Michael Tsang, et al.
28

Machine learning transparency calls for interpretable explanations of how inputs relate to predictions. Feature attribution is a way to analyze the impact of features on predictions. Feature interactions are the contextual dependence between features that jointly impact predictions. There are a number of methods that extract feature interactions in prediction models; however, the methods that assign attributions to interactions are either uninterpretable, model-specific, or non-axiomatic. We propose an interaction attribution and detection framework called Archipelago which addresses these problems and is also scalable in real-world settings. Our experiments on standard annotation labels indicate our approach provides significantly more interpretable explanations than comparable methods, which is important for analyzing the impact of interactions on predictions. We also provide accompanying visualizations of our approach that give new insights into deep neural networks.

READ FULL TEXT

page 8

page 21

page 27

page 28

page 29

page 30

page 31

page 34

research
05/16/2023

The Weighted Möbius Score: A Unified Framework for Feature Attribution

Feature attribution aims to explain the reasoning behind a black-box mod...
research
02/06/2019

Global Explanations of Neural Networks: Mapping the Landscape of Predictions

A barrier to the wider adoption of neural networks is their lack of inte...
research
05/12/2023

Asymmetric feature interaction for interpreting model predictions

In natural language processing (NLP), deep neural networks (DNNs) could ...
research
02/26/2021

PredDiff: Explanations and Interactions from Conditional Expectations

PredDiff is a model-agnostic, local attribution method that is firmly ro...
research
06/21/2023

Feature Interactions Reveal Linguistic Structure in Language Models

We study feature interactions in the context of feature attribution meth...
research
08/27/2021

Translation Error Detection as Rationale Extraction

Recent Quality Estimation (QE) models based on multilingual pre-trained ...
research
10/15/2021

Interpretable Neural Networks with Frank-Wolfe: Sparse Relevance Maps and Relevance Orderings

We study the effects of constrained optimization formulations and Frank-...

Please sign up or login with your details

Forgot password? Click here to reset