Deep Multimodal Fusion by Channel Exchanging

11/10/2020
by   Yikai Wang, et al.
8

Deep multimodal fusion by using multiple sources of data for classification or regression has exhibited a clear advantage over the unimodal counterpart on various applications. Yet, current methods including aggregation-based and alignment-based fusion are still inadequate in balancing the trade-off between inter-modal fusion and intra-modal processing, incurring a bottleneck of performance improvement. To this end, this paper proposes Channel-Exchanging-Network (CEN), a parameter-free multimodal fusion framework that dynamically exchanges channels between sub-networks of different modalities. Specifically, the channel exchanging process is self-guided by individual channel importance that is measured by the magnitude of Batch-Normalization (BN) scaling factor during training. The validity of such exchanging process is also guaranteed by sharing convolutional filters yet keeping separate BN layers across modalities, which, as an add-on benefit, allows our multimodal architecture to be almost as compact as a unimodal network. Extensive experiments on semantic segmentation via RGB-D data and image translation through multi-domain input verify the effectiveness of our CEN compared to current state-of-the-art methods. Detailed ablation studies have also been carried out, which provably affirm the advantage of each component we propose. Our code is available at https://github.com/yikaiw/CEN.

READ FULL TEXT

page 8

page 13

page 16

page 17

page 18

page 19

page 20

research
12/04/2021

Channel Exchanging Networks for Multimodal and Multitask Dense Image Prediction

Multimodal fusion and multitask learning are two vital topics in machine...
research
08/11/2021

Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

We propose a compact and effective framework to fuse multimodal features...
research
04/19/2022

Multimodal Token Fusion for Vision Transformers

Many adaptations of transformers have emerged to address the single-moda...
research
09/07/2023

Multimodal Transformer for Material Segmentation

Leveraging information across diverse modalities is known to enhance per...
research
10/16/2020

Deep-HOSeq: Deep Higher Order Sequence Fusion for Multimodal Sentiment Analysis

Multimodal sentiment analysis utilizes multiple heterogeneous modalities...
research
01/31/2019

BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection

Multimodal representation learning is gaining more and more interest wit...
research
03/29/2022

Balanced Multimodal Learning via On-the-fly Gradient Modulation

Multimodal learning helps to comprehensively understand the world, by in...

Please sign up or login with your details

Forgot password? Click here to reset