Cross-Modal Message Passing for Two-stream Fusion

04/30/2019
by   Dong Wang, et al.
0

Processing and fusing information among multi-modal is a very useful technique for achieving high performance in many computer vision problems. In order to tackle multi-modal information more effectively, we introduce a novel framework for multi-modal fusion: Cross-modal Message Passing (CMMP). Specifically, we propose a cross-modal message passing mechanism to fuse two-stream network for action recognition, which composes of an appearance modal network (RGB image) and a motion modal (optical flow image) network. The objectives of individual networks in this framework are two-fold: a standard classification objective and a competing objective. The classification object ensures that each modal network predicts the true action category while the competing objective encourages each modal network to outperform the other one. We quantitatively show that the proposed CMMP fuses the traditional two-stream network more effectively, and outperforms all existing two-stream fusion method on UCF-101 and HMDB-51 datasets.

READ FULL TEXT

page 1

page 3

research
05/06/2020

Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition

Two-stream networks have provided an alternate way of exploiting the spa...
research
12/05/2016

Message Passing Multi-Agent GANs

Communicating and sharing intelligence among agents is an important face...
research
09/12/2019

CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval

Text-image cross-modal retrieval is a challenging task in the field of l...
research
05/19/2020

Multi-Modal Summary Generation using Multi-Objective Optimization

Significant development of communication technology over the past few ye...
research
09/13/2023

Multi-Modal Hybrid Learning and Sequential Training for RGB-T Saliency Detection

RGB-T saliency detection has emerged as an important computer vision tas...
research
07/22/2022

Uncertainty-aware Multi-modal Learning via Cross-modal Random Network Prediction

Multi-modal learning focuses on training models by equally combining mul...
research
03/30/2023

Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection

Temporal action detection aims to predict the time intervals and the cla...

Please sign up or login with your details

Forgot password? Click here to reset