An Empirical Study on Explanations in Out-of-Domain Settings

02/28/2022
by   George Chrysostomou, et al.
0

Recent work in Natural Language Processing has focused on developing approaches that extract faithful explanations, either via identifying the most important tokens in the input (i.e. post-hoc explanations) or by designing inherently faithful models that first select the most important tokens and then use them to predict the correct label (i.e. select-then-predict models). Currently, these approaches are largely evaluated on in-domain settings. Yet, little is known about how post-hoc explanations and inherently faithful models perform in out-of-domain settings. In this paper, we conduct an extensive empirical study that examines: (1) the out-of-domain faithfulness of post-hoc explanations, generated by five feature attribution methods; and (2) the out-of-domain performance of two inherently faithful models over six datasets. Contrary to our expectations, results show that in many cases out-of-domain post-hoc explanation faithfulness measured by sufficiency and comprehensiveness is higher compared to in-domain. We find this misleading and suggest using a random baseline as a yardstick for evaluating post-hoc explanation faithfulness. Our findings also show that select-then predict models demonstrate comparable predictive performance in out-of-domain settings to full-text trained models.

READ FULL TEXT

page 9

page 16

research
10/17/2022

On the Impact of Temporal Concept Drift on Model Explanations

Explanation faithfulness of model predictions in natural language proces...
research
12/10/2022

Identifying the Source of Vulnerability in Explanation Discrepancy: A Case Study in Neural Text Classification

Some recent works observed the instability of post-hoc explanations when...
research
05/29/2017

Contextual Explanation Networks

We introduce contextual explanation networks (CENs)---a class of models ...
research
05/05/2020

Contextualizing Hate Speech Classifiers with Post-hoc Explanation

Hate speech classifiers trained on imbalanced datasets struggle to deter...
research
07/23/2023

Right for the Wrong Reason: Can Interpretable ML Techniques Detect Spurious Correlations?

While deep neural network models offer unmatched classification performa...
research
11/06/2022

Calibration Meets Explanation: A Simple and Effective Approach for Model Confidence Estimates

Calibration strengthens the trustworthiness of black-box models by produ...
research
03/23/2023

Reckoning with the Disagreement Problem: Explanation Consensus as a Training Objective

As neural networks increasingly make critical decisions in high-stakes s...

Please sign up or login with your details

Forgot password? Click here to reset