MSAF: Multimodal Split Attention Fusion

12/13/2020
by   Lang Su, et al.
0

Multimodal learning mimics the reasoning process of the human multi-sensory system, which is used to perceive the surrounding world. While making a prediction, the human brain tends to relate crucial cues from multiple sources of information. In this work, we propose a novel multimodal fusion module that learns to emphasize more contributive features across all modalities. Specifically, the proposed Multimodal Split Attention Fusion (MSAF) module splits each modality into channel-wise equal feature blocks and creates a joint representation that is used to generate soft attention for each channel across the feature blocks. Further, the MSAF module is designed to be compatible with features of various spatial dimensions and sequence lengths, suitable for both CNNs and RNNs. Thus, MSAF can be easily added to fuse features of any unimodal networks and utilize existing pretrained unimodal model weights. To demonstrate the effectiveness of our fusion module, we design three multimodal networks with MSAF for emotion recognition, sentiment analysis, and action recognition tasks. Our approach achieves competitive results in each task and outperforms other application-specific networks and multimodal fusion benchmarks.

READ FULL TEXT
research
11/20/2019

MMTM: Multimodal Transfer Module for CNN Fusion

In late fusion, each modality is processed in a separate unimodal Convol...
research
09/06/2022

Finger Multimodal Feature Fusion and Recognition Based on Channel Spatial Attention

Due to the instability and limitations of unimodal biometric systems, mu...
research
02/27/2023

Using Auxiliary Tasks In Multimodal Fusion Of Wav2vec 2.0 And BERT For Multimodal Emotion Recognition

The lack of data and the difficulty of multimodal fusion have always bee...
research
03/04/2022

MM-DFN: Multimodal Dynamic Fusion Network for Emotion Recognition in Conversations

Emotion Recognition in Conversations (ERC) has considerable prospects fo...
research
05/31/2018

Efficient Low-rank Multimodal Fusion with Modality-Specific Factors

Multimodal research is an emerging field of artificial intelligence, and...
research
03/31/2022

Dynamic Multimodal Fusion

Deep multimodal learning has achieved great progress in recent years. Ho...
research
09/28/2021

Neural Dependency Coding inspired Multimodal Fusion

Information integration from different modalities is an active area of r...

Please sign up or login with your details

Forgot password? Click here to reset