High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion

01/25/2021
by   Mohammed Salah Al-Radhi, et al.
0

This Ph.D. thesis focuses on developing a system for high-quality speech synthesis and voice conversion. Vocoder-based speech analysis, manipulation, and synthesis plays a crucial role in various kinds of statistical parametric speech research. Although there are vocoding methods which yield close to natural synthesized speech, they are typically computationally expensive, and are thus not suitable for real-time implementation, especially in embedded environments. Therefore, there is a need for simple and computationally feasible digital signal processing algorithms for generating high-quality and natural-sounding synthesized speech. In this dissertation, I propose a solution to extract optimal acoustic features and a new waveform generator to achieve higher sound quality and conversion accuracy by applying advances in deep learning. The approach remains computationally efficient. This challenge resulted in five thesis groups, which are briefly summarized below.

READ FULL TEXT

page 21

page 24

page 25

page 26

page 32

page 36

research
04/05/2019

WaveCycleGAN2: Time-domain Neural Post-filter for Speech Waveform Generation

WaveCycleGAN has recently been proposed to bridge the gap between natura...
research
02/17/2020

Lifter Training and Sub-band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials

In this paper, we propose computationally efficient and high-quality met...
research
02/22/2022

Wavebender GAN: An architecture for phonetically meaningful speech manipulation

Deep learning has revolutionised synthetic speech quality. However, it h...
research
01/19/2018

Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals

Time- and pitch-scale modifications of speech signals find important app...
research
09/25/2018

WaveCycleGAN: Synthetic-to-natural speech waveform conversion using cycle-consistent adversarial networks

We propose a learning-based filter that allows us to directly modify a s...
research
09/13/2022

Deep Speech Synthesis from Articulatory Representations

In the articulatory synthesis task, speech is synthesized from input fea...
research
10/21/2020

Grapheme or phoneme? An Analysis of Tacotron's Embedded Representations

End-to-end models, particularly Tacotron-based ones, are currently a pop...

Please sign up or login with your details

Forgot password? Click here to reset