Learning Discriminative features using Center Loss and Reconstruction as Regularizer for Speech Emotion Recognition

06/19/2019
by   Suraj Tripathi, et al.
0

This paper proposes a Convolutional Neural Network (CNN) inspired by Multitask Learning (MTL) and based on speech features trained under the joint supervision of softmax loss and center loss, a powerful metric learning strategy, for the recognition of emotion in speech. Speech features such as Spectrograms and Mel-frequency Cepstral Coefficient s (MFCCs) help retain emotion-related low-level characteristics in speech. We experimented with several Deep Neural Network (DNN) architectures that take in speech features as input and trained them under both softmax and center loss, which resulted in highly discriminative features ideal for Speech Emotion Recognition (SER). Our networks also employ a regularizing effect by simultaneously performing the auxiliary task of reconstructing the input speech features. This sharing of representations among related tasks enables our network to better generalize the original task of SER. Some of our proposed networks contain far fewer parameters when compared to state-of-the-art architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2019

Deep Learning based Emotion Recognition System Using Speech Features and Transcriptions

This paper proposes a speech emotion recognition method based on speech ...
research
06/11/2019

Focal Loss based Residual Convolutional Neural Network for Speech Emotion Recognition

This paper proposes a Residual Convolutional Neural Network (ResNet) bas...
research
02/08/2021

Non-linear frequency warping using constant-Q transformation for speech emotion recognition

In this work, we explore the constant-Q transform (CQT) for speech emoti...
research
06/27/2022

SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning

Speech emotion recognition (SER) has many challenges, but one of the mai...
research
02/27/2018

Deep factorization for speech signal

Various informative factors mixed in speech signals, leading to great di...
research
09/21/2023

The Broad Impact of Feature Imitation: Neural Enhancements Across Financial, Speech, and Physiological Domains

Initialization of neural network weights plays a pivotal role in determi...

Please sign up or login with your details

Forgot password? Click here to reset