Variational Autoencoders for Learning Latent Representations of Speech Emotion
Latent representation of data in unsupervised fashion is a very interesting process. It provides more relevant features that can enhance the performance of a classifier. For speech emotion recognition tasks generating effective features is very crucial. Recently, deep generative models such as Variational Autoencoders (VAEs) have gained enormous success to model natural images. Being inspired by that in this paper, we use VAE for the modeling of emotions in human speech. We derive the latent representation of speech signal and use this for classification of emotions. We demonstrate that features learned by VAEs can achieve state-of-the-art emotion recognition results.
READ FULL TEXT