Improving Trajectory Modelling for DNN-based Speech Synthesis by using Stacked Bottleneck Features and Minimum Generation Error Training

02/22/2016
by   Zhizheng Wu, et al.
0

We propose two novel techniques --- stacking bottleneck features and minimum generation error training criterion --- to improve the performance of deep neural network (DNN)-based speech synthesis. The techniques address the related issues of frame-by-frame independence and ignorance of the relationship between static and dynamic features, within current typical DNN-based synthesis frameworks. Stacking bottleneck features, which are an acoustically--informed linguistic representation, provides an efficient way to include more detailed linguistic context at the input. The minimum generation error training criterion minimises overall output trajectory error across an utterance, rather than minimising the error per frame independently, and thus takes into account the interaction between static and dynamic features. The two techniques can be easily combined to further improve performance. We present both objective and subjective results that demonstrate the effectiveness of the proposed techniques. The subjective results show that combining the two techniques leads to significantly more natural synthetic speech than from conventional DNN or long short-term memory (LSTM) recurrent neural network (RNN) systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2018

Language Identification with Deep Bottleneck Features

In this paper we proposed an end-to-end short utterances speech language...
research
08/02/2018

Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects

We investigated the impact of noisy linguistic features on the performan...
research
04/07/2015

Deep Recurrent Neural Networks for Acoustic Modelling

We present a novel deep Recurrent Neural Network (RNN) model for acousti...
research
04/12/2017

Sampling-based speech parameter generation using moment-matching networks

This paper presents sampling-based speech parameter generation using mom...
research
01/24/2018

Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

This paper presents a waveform modeling and generation method using hier...
research
02/08/2016

LSTM Deep Neural Networks Postfiltering for Improving the Quality of Synthetic Voices

Recent developments in speech synthesis have produced systems capable of...
research
05/26/2023

ElectrodeNet – A Deep Learning Based Sound Coding Strategy for Cochlear Implants

ElectrodeNet, a deep learning based sound coding strategy for the cochle...

Please sign up or login with your details

Forgot password? Click here to reset