DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations

by   Yiwei Lyu, et al.

The ability for a human to understand an Artificial Intelligence (AI) model's decision-making process is critical in enabling stakeholders to visualize model behavior, perform model debugging, promote trust in AI models, and assist in collaborative human-AI decision-making. As a result, the research fields of interpretable and explainable AI have gained traction within AI communities as well as interdisciplinary scientists seeking to apply AI in their subject areas. In this paper, we focus on advancing the state-of-the-art in interpreting multimodal models - a class of machine learning methods that tackle core challenges in representing and capturing interactions between heterogeneous data sources such as images, text, audio, and time-series data. Multimodal models have proliferated numerous real-world applications across healthcare, robotics, multimedia, affective computing, and human-computer interaction. By performing model disentanglement into unimodal contributions (UC) and multimodal interactions (MI), our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models while maintaining generality across arbitrary modalities, model architectures, and tasks. Through a comprehensive suite of experiments on both synthetic and real-world multimodal tasks, we show that DIME generates accurate disentangled explanations, helps users of multimodal models gain a deeper understanding of model behavior, and presents a step towards debugging and improving these models for real-world deployment. Code for our experiments can be found at https://github.com/lvyiwei1/DIME.


page 1

page 5

page 9

page 10


MultiViz: An Analysis Benchmark for Visualizing and Understanding Multimodal Models

The promise of multimodal models for real-world applications has inspire...

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

Learning multimodal representations involves integrating information fro...

A Meta-Analysis of the Utility of Explainable Artificial Intelligence in Human-AI Decision-Making

Research in artificial intelligence (AI)-assisted decision-making is exp...

Reveal to Revise: An Explainable AI Life Cycle for Iterative Bias Correction of Deep Models

State-of-the-art machine learning models often learn spurious correlatio...

Integrated multimodal artificial intelligence framework for healthcare applications

Artificial intelligence (AI) systems hold great promise to improve healt...

TalkToModel: Understanding Machine Learning Models With Open Ended Dialogues

Machine Learning (ML) models are increasingly used to make critical deci...

FairCVtest Demo: Understanding Bias in Multimodal Learning with a Testbed in Fair Automatic Recruitment

With the aim of studying how current multimodal AI algorithms based on h...

Please sign up or login with your details

Forgot password? Click here to reset