Going Retro: Astonishingly Simple Yet Effective Rule-based Prosody Modelling for Speech Synthesis Simulating Emotion Dimensions

07/05/2023
by   Felix Burkhardt, et al.
0

We introduce two rule-based models to modify the prosody of speech synthesis in order to modulate the emotion to be expressed. The prosody modulation is based on speech synthesis markup language (SSML) and can be used with any commercial speech synthesizer. The models as well as the optimization result are evaluated against human emotion annotations. Results indicate that with a very simple method both dimensions arousal (.76 UAR) and valence (.43 UAR) can be simulated.

READ FULL TEXT
research
04/06/2022

Simple and Effective Unsupervised Speech Synthesis

We introduce the first unsupervised speech synthesis system based on a s...
research
06/13/2019

Adjusting Pleasure-Arousal-Dominance for Continuous Emotional Text-to-speech Synthesizer

Emotion is not limited to discrete categories of happy, sad, angry, fear...
research
09/15/2018

Annotations for Rule-Based Models

The chapter reviews the syntax to store machine-readable annotations and...
research
04/03/2021

Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability

Emotional text-to-speech synthesis (ETTS) has seen much progress in rece...
research
06/22/2023

MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning

In this paper, we introduce MFCCGAN as a novel speech synthesizer based ...
research
11/24/1998

Speech Synthesis with Neural Networks

Text-to-speech conversion has traditionally been performed either by con...
research
05/23/2023

ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings

We propose ChatGPT-EDSS, an empathetic dialogue speech synthesis (EDSS) ...

Please sign up or login with your details

Forgot password? Click here to reset