Issues with post-hoc counterfactual explanations: a discussion

06/11/2019
by   Thibault Laugel, et al.
0

Counterfactual post-hoc interpretability approaches have been proven to be useful tools to generate explanations for the predictions of a trained blackbox classifier. However, the assumptions they make about the data and the classifier make them unreliable in many contexts. In this paper, we discuss three desirable properties and approaches to quantify them: proximity, connectedness and stability. In addition, we illustrate that there is a risk for post-hoc counterfactual approaches to not satisfy these properties.

READ FULL TEXT

page 3

page 4

research
07/22/2019

The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations

Post-hoc interpretability approaches have been proven to be powerful too...
research
03/29/2022

Diffusion Models for Counterfactual Explanations

Counterfactual explanations have shown promising results as a post-hoc f...
research
06/04/2018

Do the laws of physics prohibit counterfactual communication?

It has been conjectured that counterfactual communication is impossible,...
research
12/22/2017

Inverse Classification for Comparison-based Interpretability in Machine Learning

In the context of post-hoc interpretability, this paper addresses the ta...
research
07/27/2023

Counterfactual Explanations for Graph Classification Through the Lenses of Density

Counterfactual examples have emerged as an effective approach to produce...
research
12/04/2018

Multimodal Explanations by Predicting Counterfactuality in Videos

This study addresses generating counterfactual explanations with multimo...
research
10/21/2019

Towards User Empowerment

Counterfactual explanations can be obtained by identifying the smallest ...

Please sign up or login with your details

Forgot password? Click here to reset