Analysis and Assessment of Controllability of an Expressive Deep Learning-based TTS system

03/06/2021
by   Noé Tits, et al.
0

In this paper, we study the controllability of an Expressive TTS system trained on a dataset for a continuous control. The dataset is the Blizzard 2013 dataset based on audiobooks read by a female speaker containing a great variability in styles and expressiveness. Controllability is evaluated with both an objective and a subjective experiment. The objective assessment is based on a measure of correlation between acoustic features and the dimensions of the latent space representing expressiveness. The subjective assessment is based on a perceptual experiment in which users are shown an interface for Controllable Expressive TTS and asked to retrieve a synthetic utterance whose expressiveness subjectively corresponds to that a reference utterance.

READ FULL TEXT
research
03/24/2018

Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

We present an extension to the Tacotron speech synthesis architecture th...
research
08/25/2020

ICE-Talk: an Interface for a Controllable Expressive Talking Machine

ICE-Talk is an open source web-based GUI that allows the use of a TTS sy...
research
11/04/2022

NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS

Expressive text-to-speech (TTS) can synthesize a new speaking style by i...
research
11/01/2022

Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis

This paper proposes an Expressive Speech Synthesis model that utilizes t...
research
01/31/2023

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt

Expressive text-to-speech (TTS) aims to synthesize different speaking st...
research
11/24/2022

Prosody-controllable spontaneous TTS with neural HMMs

Spontaneous speech has many affective and pragmatic functions that are i...
research
08/04/2020

Expressive TTS Training with Frame and Style Reconstruction Loss

We propose a novel training strategy for Tacotron-based text-to-speech (...

Please sign up or login with your details

Forgot password? Click here to reset