Model Interpretability through the Lens of Computational Complexity

by   Pablo Barceló, et al.

In spite of several claims stating that some models are more interpretable than others – e.g., "linear models are more interpretable than deep neural networks" – we still lack a principled notion of interpretability to formally compare among different classes of models. We make a step towards such a notion by studying whether folklore interpretability claims have a correlate in terms of computational complexity theory. We focus on local post-hoc explainability queries that, intuitively, attempt to answer why individual inputs are classified in a certain way by a given model. In a nutshell, we say that a class 𝒞_1 of models is more interpretable than another class 𝒞_2, if the computational complexity of answering post-hoc queries for models in 𝒞_2 is higher than for those in 𝒞_1. We prove that this notion provides a good theoretical counterpart to current beliefs on the interpretability of models; in particular, we show that under our definition and assuming standard complexity-theoretical assumptions (such as P≠NP), both linear and tree-based models are strictly more interpretable than neural networks. Our complexity analysis, however, does not provide a clear-cut difference between linear and tree-based models, as we obtain different results depending on the particular post-hoc explanations considered. Finally, by applying a finer complexity analysis based on parameterized complexity, we are able to prove a theoretical result suggesting that shallow neural networks are more interpretable than deeper ones.


page 1

page 2

page 3

page 4


The Mythos of Model Interpretability

Supervised machine learning models boast remarkable predictive capabilit...

Posthoc Interpretability of Learning to Rank Models using Secondary Training Data

Predictive models are omnipresent in automated and assisted decision mak...

Foundations of Symbolic Languages for Model Interpretability

Several queries and scores have recently been proposed to explain indivi...

Interpretability is in the Mind of the Beholder: A Causal Framework for Human-interpretable Representation Learning

Focus in Explainable AI is shifting from explanations defined in terms o...

On the Equivalence of the Weighted Tsetlin Machine and the Perceptron

Tsetlin Machine (TM) has been gaining popularity as an inherently interp...

(Un)reasonable Allure of Ante-hoc Interpretability for High-stakes Domains: Transparency Is Necessary but Insufficient for Explainability

Ante-hoc interpretability has become the holy grail of explainable machi...

TIP: Typifying the Interpretability of Procedures

We provide a novel notion of what it means to be interpretable, looking ...

Please sign up or login with your details

Forgot password? Click here to reset