Data Augmentation with Unsupervised Speaking Style Transfer for Speech Emotion Recognition

11/16/2022
by   Leyuan Qu, et al.
0

Currently, the performance of Speech Emotion Recognition (SER) systems is mainly constrained by the absence of large-scale labelled corpora. Data augmentation is regarded as a promising approach, which borrows methods from Automatic Speech Recognition (ASR), for instance, perturbation on speed and pitch, or generating emotional speech utilizing generative adversarial networks. In this paper, we propose EmoAug, a novel style transfer model to augment emotion expressions, in which a semantic encoder and a paralinguistic encoder represent verbal and non-verbal information respectively. Additionally, a decoder reconstructs speech signals by conditioning on the aforementioned two information flows in an unsupervised fashion. Once training is completed, EmoAug enriches expressions of emotional speech in different prosodic attributes, such as stress, rhythm and intensity, by feeding different styles into the paralinguistic encoder. In addition, we can also generate similar numbers of samples for each class to tackle the data imbalance issue. Experimental results on the IEMOCAP dataset demonstrate that EmoAug can successfully transfer different speaking styles while retaining the speaker identity and semantic content. Furthermore, we train a SER model with data augmented by EmoAug and show that it not only surpasses the state-of-the-art supervised and self-supervised methods but also overcomes overfitting problems caused by data imbalance. Some audio samples can be found on our demo website.

READ FULL TEXT

page 1

page 3

page 4

research
12/14/2022

Disentangling Prosody Representations with Unsupervised Speech Reconstruction

Human speech can be characterized by different components, including sem...
research
10/28/2020

Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset

Emotional voice conversion aims to transform emotional prosody in speech...
research
09/19/2023

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

In this paper, we explored how to boost speech emotion recognition (SER)...
research
05/18/2020

Augmenting Generative Adversarial Networks for Speech Emotion Recognition

Generative adversarial networks (GANs) have shown potential in learning ...
research
05/19/2023

A Preliminary Study on Augmenting Speech Emotion Recognition using a Diffusion Model

In this paper, we propose to utilise diffusion models for data augmentat...
research
09/06/2022

Read it to me: An emotionally aware Speech Narration Application

In this work we try to perform emotional style transfer on audios. In pa...
research
02/03/2021

Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation

In Speech Emotion Recognition (SER), emotional characteristics often app...

Please sign up or login with your details

Forgot password? Click here to reset