Visualising Model Training via Vowel Space for Text-To-Speech Systems

08/21/2022
by   Binu Abeysinghe, et al.
0

With the recent developments in speech synthesis via machine learning, this study explores incorporating linguistics knowledge to visualise and evaluate synthetic speech model training. If changes to the first and second formant (in turn, the vowel space) can be seen and heard in synthetic speech, this knowledge can inform speech synthesis technology developers. A speech synthesis model trained on a large General American English database was fine-tuned into a New Zealand English voice to identify if the changes in the vowel space of synthetic speech could be seen and heard. The vowel spaces at different intervals during the fine-tuning were analysed to determine if the model learned the New Zealand English vowel space. Our findings based on vowel space analysis show that we can visualise how a speech synthesis model learns the vowel space of the database it is trained on. Perception tests confirmed that humans could perceive when a speech synthesis model has learned characteristics of the speech database it is training on. Using the vowel space as an intermediary evaluation helps understand what sounds are to be added to the training database and build speech synthesis models based on linguistics knowledge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2019

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

We present a multispeaker, multilingual text-to-speech (TTS) synthesis m...
research
03/22/2022

A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis

Speech synthesis has come a long way as current text-to-speech (TTS) mod...
research
06/07/2020

Analysis and Synthesis of Hypo and Hyperarticulated Speech

This paper focuses on the analysis and synthesis of hypo and hyperarticu...
research
06/25/2018

The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems

In this paper, we present a database of emotional speech intended to be ...
research
03/06/2022

Variational Auto-Encoder based Mandarin Speech Cloning

Speech cloning technology is becoming more sophisticated thanks to the a...
research
09/20/2021

"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World

Advances in deep learning have introduced a new wave of voice synthesis ...
research
03/02/2020

Semi-supervised learning of glottal pulse positions in a neural analysis-synthesis framework

This article investigates into recently emerging approaches that use dee...

Please sign up or login with your details

Forgot password? Click here to reset