MEME: Generating RNN Model Explanations via Model Extraction

by   Dmitry Kazhdan, et al.
University of Cambridge

Recurrent Neural Networks (RNNs) have achieved remarkable performance on a range of tasks. A key step to further empowering RNN-based approaches is improving their explainability and interpretability. In this work we present MEME: a model extraction approach capable of approximating RNNs with interpretable models represented by human-understandable concepts and their interactions. We demonstrate how MEME can be applied to two multivariate, continuous data case studies: Room Occupation Prediction, and In-Hospital Mortality Prediction. Using these case-studies, we show how our extracted models can be used to interpret RNNs both locally and globally, by approximating RNN decision-making via interpretable concept interactions.


Now You See Me (CME): Concept-based Model Extraction

Deep Neural Networks (DNNs) have achieved remarkable performance on a ra...

Interpretable Additive Recurrent Neural Networks For Multivariate Clinical Time Series

Time series models with recurrent neural networks (RNNs) can have high a...

NeuroView-RNN: It's About Time

Recurrent Neural Networks (RNNs) are important tools for processing sequ...

On Attribution of Recurrent Neural Network Predictions via Additive Decomposition

RNN models have achieved the state-of-the-art performance in a wide rang...

Transfer Learning for Clinical Time Series Analysis using Recurrent Neural Networks

Deep neural networks have shown promising results for various clinical p...

Learning with Interpretable Structure from RNN

In structure learning, the output is generally a structure that is used ...

Implicit N-grams Induced by Recurrence

Although self-attention based models such as Transformers have achieved ...

Please sign up or login with your details

Forgot password? Click here to reset