Multimodal Speech Emotion Recognition Using Audio and Text

10/10/2018
by   Seunghyun Yoon, et al.
0

Speech emotion recognition is a challenging task, and extensive reliance has been placed on models that use audio features in building well-performing classifiers. In this paper, we propose a novel deep dual recurrent encoder model that utilizes text data and audio signals simultaneously to obtain a better understanding of speech data. As emotional dialogue is composed of sound and spoken content, our model encodes the information from audio and text sequences using dual recurrent neural networks (RNNs) and then combines the information from these sources to predict the emotion class. This architecture analyzes speech data from the signal level to the language level, and it thus utilizes the information within the data more comprehensively than models that focus on audio features. Extensive experiments are conducted to investigate the efficacy and properties of the proposed model. Our proposed model outperforms previous state-of-the-art methods in assigning data to one of four emotion categories (i.e., angry, happy, sad and neutral) when the model is applied to the IEMOCAP dataset, as reflected by accuracies ranging from 68.8

READ FULL TEXT
research
06/08/2021

Efficient Speech Emotion Recognition Using Multi-Scale CNN and Attention

Emotion recognition from speech is a challenging task. Re-cent advances ...
research
11/20/2022

Contrastive Regularization for Multimodal Emotion Recognition Using Audio and Text

Speech emotion recognition is a challenge and an important step towards ...
research
10/20/2019

Speech Emotion Recognition with Dual-Sequence LSTM Architecture

Speech Emotion Recognition (SER) has emerged as a critical component of ...
research
07/26/2022

Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text

In this paper, we propose a novel speech emotion recognition model calle...
research
04/30/2022

Gaze-enhanced Crossmodal Embeddings for Emotion Recognition

Emotional expressions are inherently multimodal – integrating facial beh...
research
09/05/2023

Leveraging Label Information for Multimodal Emotion Recognition

Multimodal emotion recognition (MER) aims to detect the emotional status...
research
06/07/2017

Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition

Deep convolutional neural networks are being actively investigated in a ...

Please sign up or login with your details

Forgot password? Click here to reset