Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis

06/23/2022
by   Tae-Woo Kim, et al.
0

Recently, deep learning-based generative models have been introduced to generate singing voices. One approach is to predict the parametric vocoder features consisting of explicit speech parameters. This approach has the advantage that the meaning of each feature is explicitly distinguished. Another approach is to predict mel-spectrograms for a neural vocoder. However, parametric vocoders have limitations of voice quality and the mel-spectrogram features are difficult to model because the timbre and pitch information are entangled. In this study, we propose a singing voice synthesis model with multi-task learning to use both approaches – acoustic features for a parametric vocoder and mel-spectrograms for a neural vocoder. By using the parametric vocoder features as auxiliary features, the proposed model can efficiently disentangle and control the timbre and pitch components of the mel-spectrogram. Moreover, a generative adversarial network framework is applied to improve the quality of singing voices in a multi-singer model. Experimental results demonstrate that our proposed model can generate more natural singing voices than the single-task models, while performing better than the conventional parametric vocoder-based model.

READ FULL TEXT
research
07/06/2017

Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework

In this paper, we aim at improving the performance of synthesized speech...
research
08/11/2020

Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS

Tacotron-based end-to-end speech synthesis has shown remarkable voice qu...
research
06/22/2018

Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions

This paper introduces an improved generative model for statistical param...
research
01/26/2020

Multi-task Learning for Voice Trigger Detection

We describe the design of a voice trigger detection system for smart spe...
research
06/07/2020

Parametric Representation for Singing Voice Synthesis: a Comparative Evaluation

Various parametric representations have been proposed to model the speec...
research
11/16/2022

TransCC: Transformer-based Multiple Illuminant Color Constancy Using Multitask Learning

Multi-illuminant color constancy is a challenging problem with only a fe...
research
02/19/2018

Subspace Network: Deep Multi-Task Censored Regression for Modeling Neurodegenerative Diseases

Over the past decade a wide spectrum of machine learning models have bee...

Please sign up or login with your details

Forgot password? Click here to reset