Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision

11/20/2018
by   Jing-Xuan Zhang, et al.
0

This paper presents methods of making using of text supervision to improve the performance of sequence-to-sequence (seq2seq) voice conversion. Compared with conventional frame-to-frame voice conversion approaches, the seq2seq acoustic modeling method proposed in our previous work achieved higher naturalness and similarity. In this paper, we further improve its performance by utilizing the text transcriptions of parallel training data. First, a multi-task learning structure is designed which adds auxiliary classifiers to the middle layers of the seq2seq model and predicts linguistic labels as a secondary task. Second, a data-augmentation method is proposed which utilizes text alignment to produce extra parallel sequences for model training. Experiments are conducted to evaluate our proposed method with training sets at different sizes. Experimental results show that the multi-task learning with linguistic labels is effective at reducing the errors of seq2seq voice conversion. The data-augmentation method can further improve the performance of seq2seq voice conversion when only 50 or 100 training utterances are available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2019

Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled Linguistic and Speaker Representations

In this paper, a method for non-parallel sequence-to-sequence (seq2seq) ...
research
10/16/2018

Sequence-to-Sequence Acoustic Modeling for Voice Conversion

In this paper, a neural network named Sequence-to- sequence ConvErsion N...
research
10/19/2022

Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion

Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater...
research
01/06/2020

Mel-spectrogram augmentation for sequence to sequence voice conversion

When training the sequence-to-sequence voice conversion model, we need t...
research
09/30/2019

Semi-supervised voice conversion with amortized variational inference

In this work we introduce a semi-supervised approach to the voice conver...
research
06/18/2020

Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer

This paper presents a high quality singing synthesizer that is able to m...
research
04/13/2019

Unsupervised Singing Voice Conversion

We present a deep learning method for singing voice conversion. The prop...

Please sign up or login with your details

Forgot password? Click here to reset