Investigations on Audiovisual Emotion Recognition in Noisy Conditions

03/02/2021
by   Michael Neumann, et al.
0

In this paper we explore audiovisual emotion recognition under noisy acoustic conditions with a focus on speech features. We attempt to answer the following research questions: (i) How does speech emotion recognition perform on noisy data? and (ii) To what extend does a multimodal approach improve the accuracy and compensate for potential performance degradation at different noise levels? We present an analytical investigation on two emotion datasets with superimposed noise at different signal-to-noise ratios, comparing three types of acoustic features. Visual features are incorporated with a hybrid fusion approach: The first neural network layers are separate modality-specific ones, followed by at least one shared layer before the final prediction. The results show a significant performance decrease when a model trained on clean audio is applied to noisy data and that the addition of visual features alleviates this effect.

READ FULL TEXT

page 4

page 5

research
06/15/2020

Emotion Recognition in Audio and Video Using Deep Neural Networks

Humans are able to comprehend information from multiple domains for e.g....
research
03/03/2020

The Effect of Silence Feature in Dimensional Speech Emotion Recognition

Silence is a part of human-to-human communication, which can be a clue f...
research
08/12/2018

Multimodal Local-Global Ranking Fusion for Emotion Recognition

Emotion recognition is a core research area at the intersection of artif...
research
10/21/2020

Dynamic Layer Customization for Noise Robust Speech Emotion Recognition in Heterogeneous Condition Training

Robustness to environmental noise is important to creating automatic spe...
research
09/03/2023

Noise robust speech emotion recognition with signal-to-noise ratio adapting speech enhancement

Speech emotion recognition (SER) often experiences reduced performance d...
research
04/06/2018

On the Robustness of Speech Emotion Recognition for Human-Robot Interaction with Deep Neural Networks

Speech emotion recognition (SER) is an important aspect of effective hum...
research
01/26/2022

Self-attention fusion for audiovisual emotion recognition with incomplete data

In this paper, we consider the problem of multimodal data analysis with ...

Please sign up or login with your details

Forgot password? Click here to reset