One-Versus-Others Attention: Scalable Multimodal Integration

07/11/2023
by   Michal Golovanevsky, et al.
0

Multimodal learning models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to autonomous driving. Despite the importance of multimodal learning, existing efforts focus on NLP applications, where the number of modalities is typically less than four (audio, video, text, images). However, data inputs in other domains, such as the medical field, may include X-rays, PET scans, MRIs, genetic screening, clinical notes, and more, creating a need for both efficient and accurate information fusion. Many state-of-the-art models rely on pairwise cross-modal attention, which does not scale well for applications with more than three modalities. For n modalities, computing attention will result in n 2 operations, potentially requiring considerable amounts of computational resources. To address this, we propose a new domain-neutral attention mechanism, One-Versus-Others (OvO) attention, that scales linearly with the number of modalities and requires only n attention operations, thus offering a significant reduction in computational complexity compared to existing cross-modal attention algorithms. Using three diverse real-world datasets as well as an additional simulation experiment, we show that our method improves performance compared to popular fusion techniques while decreasing computation costs.

READ FULL TEXT

page 9

page 15

research
06/17/2022

Multimodal Attention-based Deep Learning for Alzheimer's Disease Diagnosis

Alzheimer's Disease (AD) is the most common neurodegenerative disorder w...
research
08/08/2023

OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation

This paper presents OmniDataComposer, an innovative approach for multimo...
research
04/19/2019

EmbraceNet: A robust deep learning architecture for multimodal classification

Classification using multimodal data arises in many machine learning app...
research
10/21/2021

Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection

Multimodal learning is an emerging yet challenging research area. In thi...
research
11/22/2020

Hierachical Delta-Attention Method for Multimodal Fusion

In vision and linguistics; the main input modalities are facial expressi...
research
12/18/2022

Medical Diagnosis with Large Scale Multimodal Transformers: Leveraging Diverse Data for More Accurate Diagnosis

Multimodal deep learning has been used to predict clinical endpoints and...
research
06/30/2022

MultiViz: An Analysis Benchmark for Visualizing and Understanding Multimodal Models

The promise of multimodal models for real-world applications has inspire...

Please sign up or login with your details

Forgot password? Click here to reset