Extending Compositional Attention Networks for Social Reasoning in Videos

10/03/2022
by   Christina Sartzetaki, et al.
0

We propose a novel deep architecture for the task of reasoning about social interactions in videos. We leverage the multi-step reasoning capabilities of Compositional Attention Networks (MAC), and propose a multimodal extension (MAC-X). MAC-X is based on a recurrent cell that performs iterative mid-level fusion of input modalities (visual, auditory, text) over multiple reasoning steps, by use of a temporal attention mechanism. We then combine MAC-X with LSTMs for temporal input processing in an end-to-end architecture. Our ablation studies show that the proposed MAC-X architecture can effectively leverage multimodal input cues using mid-level fusion mechanisms. We apply MAC-X to the task of Social Video Question Answering in the Social IQ dataset and obtain a 2.5 state-of-the-art.

READ FULL TEXT
research
09/21/2018

Multimodal Dual Attention Memory for Video Story Question Answering

We propose a video story question-answering (QA) architecture, Multimoda...
research
03/08/2018

Compositional Attention Networks for Machine Reasoning

We present the MAC network, a novel fully differentiable neural network ...
research
10/19/2022

Dense but Efficient VideoQA for Intricate Compositional Reasoning

It is well known that most of the conventional video question answering ...
research
04/08/2019

Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering

In this paper, we propose a novel end-to-end trainable Video Question An...
research
07/04/2020

Modality Shifting Attention Network for Multi-modal Video Question Answering

This paper considers a network referred to as Modality Shifting Attentio...
research
10/26/2022

End-to-End Multimodal Representation Learning for Video Dialog

Video-based dialog task is a challenging multimodal learning task that h...
research
12/11/2021

COMPOSER: Compositional Learning of Group Activity in Videos

Group Activity Recognition (GAR) detects the activity performed by a gro...

Please sign up or login with your details

Forgot password? Click here to reset