Non-linear frequency warping using constant-Q transformation for speech emotion recognition

02/08/2021
by   Premjeet Singh, et al.
0

In this work, we explore the constant-Q transform (CQT) for speech emotion recognition (SER). The CQT-based time-frequency analysis provides variable spectro-temporal resolution with higher frequency resolution at lower frequencies. Since lower-frequency regions of speech signal contain more emotion-related information than higher-frequency regions, the increased low-frequency resolution of CQT makes it more promising for SER than standard short-time Fourier transform (STFT). We present a comparative analysis of short-term acoustic features based on STFT and CQT for SER with deep neural network (DNN) as a back-end classifier. We optimize different parameters for both features. The CQT-based features outperform the STFT-based spectral features for SER experiments. Further experiments with cross-corpora evaluation demonstrate that the CQT-based systems provide better generalization with out-of-domain training data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/29/2022

Analysis of constant-Q filterbank based representations for speech emotion recognition

This work analyzes the constant-Q filterbank-based time-frequency repres...
research
08/07/2019

Pitch-Synchronous Single Frequency Filtering Spectrogram for Speech Emotion Recognition

Convolutional neural networks (CNN) are widely used for speech emotion r...
research
02/21/2019

STFNets: Learning Sensing Signals from the Time-Frequency Perspective with Short-Time Fourier Neural Networks

Recent advances in deep learning motivate the use of deep neural network...
research
03/31/2016

Learning Multiscale Features Directly From Waveforms

Deep learning has dramatically improved the performance of speech recogn...
research
01/14/2023

Modulation spectral features for speech emotion recognition using deep neural networks

This work explores the use of constant-Q transform based modulation spec...
research
06/19/2019

Learning Discriminative features using Center Loss and Reconstruction as Regularizer for Speech Emotion Recognition

This paper proposes a Convolutional Neural Network (CNN) inspired by Mul...
research
05/11/2021

Deep scattering network for speech emotion recognition

This paper introduces scattering transform for speech emotion recognitio...

Please sign up or login with your details

Forgot password? Click here to reset