Linear networks based speaker adaptation for speech synthesis

03/05/2018
by   Zhiying Huang, et al.
0

Speaker adaptation methods aim to create fair quality synthesis speech voice font for target speakers while only limited resources available. Recently, as deep neural networks based statistical parametric speech synthesis (SPSS) methods become dominant in SPSS TTS back-end modeling, speaker adaptation under the neural network based SPSS framework has also became an important task. In this paper, linear networks (LN) is inserted in multiple neural network layers and fine-tuned together with output layer for best speaker adaptation performance. When adaptation data is extremely small, the low-rank plus diagonal(LRPD) decomposition for LN is employed to make the adapted voice more stable. Speaker adaptation experiments are conducted under a range of adaptation utterances numbers. Moreover, speaker adaptation from 1) female to female, 2) male to female and 3) female to male are investigated. Objective measurement and subjective tests show that LN with LRPD decomposition performs most stable when adaptation data is extremely limited, and our best speaker adaptation (SA) model with only 200 adaptation utterances achieves comparable quality with speaker dependent (SD) model trained with 1000 utterances, in both naturalness and similarity to target speaker.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2021

Speaker Adaptation with Continuous Vocoder-based DNN-TTS

Traditional vocoder-based statistical parametric speech synthesis can be...
research
11/17/2021

Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control

In this paper, a text-to-rapping/singing system is introduced, which can...
research
07/31/2018

Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems

Most neural-network based speaker-adaptive acoustic models for speech sy...
research
05/24/2022

TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS

Recently, synthesizing personalized speech by text-to-speech (TTS) appli...
research
06/18/2019

A Unified Speaker Adaptation Method for Speech Synthesis using Transcribed and Untranscribed Speech with Backpropagation

By representing speaker characteristic as a single fixed-length vector e...
research
07/16/2018

Subjective and objective experiments on the influence of speaker's gender on the unvoiced segments

Subjective and objective experiments are conducted to understand the ext...
research
10/28/2022

Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation

Adapting a neural text-to-speech (TTS) model to a target speaker typical...

Please sign up or login with your details

Forgot password? Click here to reset