Continuous Emotion Recognition using Visual-audio-linguistic information: A Technical Report for ABAW3

03/24/2022
by   Su Zhang, et al.
0

We propose a cross-modal co-attention model for continuous emotion recognition using visual-audio-linguistic information. The model consists of four blocks. The visual, audio, and linguistic blocks are used to learn the spatial-temporal features of the multi-modal input. A co-attention block is designed to fuse the learned features with the multi-head co-attention mechanism. The visual encoding from the visual block is concatenated with the attention feature to emphasize the visual information. To make full use of the data and alleviate over-fitting, cross-validation is carried out on the training and validation set. The concordance correlation coefficient (CCC) centering is used to merge the results from each fold. The achieved CCC on the test set is 0.520 for valence and 0.602 for arousal, which significantly outperforms the baseline method with the corresponding CCC of 0.180 and 0.170 for valence and arousal, respectively. The code is available at https://github.com/sucv/ABAW3.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2023

Multimodal Continuous Emotion Recognition: A Technical Report for ABAW5

We used two multimodal models for continuous valence-arousal recognition...
research
07/02/2021

Continuous Emotion Recognition with Audio-visual Leader-follower Attentive Fusion

We propose an audio-visual spatial-temporal deep neural network with: (1...
research
09/15/2022

Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst

The technical report presents our emotion recognition pipeline for high-...
research
06/25/2023

AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction

Visual information can serve as an effective cue for target speaker extr...
research
11/10/2021

Multimodal End-to-End Group Emotion Recognition using Cross-Modal Attention

Classifying group-level emotions is a challenging task due to complexity...
research
05/09/2023

Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition

Vision model have gained increasing attention due to their simplicity an...
research
08/16/2021

A visual remote associates test and its validation

The Remote Associates Test (RAT) is a widely used test for measuring cre...

Please sign up or login with your details

Forgot password? Click here to reset