Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN

09/21/2022
by   Yin-Ping Cho, et al.
0

Singing voice synthesis (SVS) is the computer production of a human-like singing voice from given musical scores. To accomplish end-to-end SVS effectively and efficiently, this work adopts the acoustic model-neural vocoder architecture established for high-quality speech and singing voice synthesis. Specifically, this work aims to pursue a higher level of expressiveness in synthesized voices by combining the diffusion denoising probabilistic model (DDPM) and Wasserstein generative adversarial network (WGAN) to construct the backbone of the acoustic model. On top of the proposed acoustic model, a HiFi-GAN neural vocoder is adopted with integrated fine-tuning to ensure optimal synthesis quality for the resulting end-to-end SVS system. This end-to-end system was evaluated with the multi-singer Mpop600 Mandarin singing voice dataset. In the experiments, the proposed system exhibits improvements over previous landmark counterparts in terms of musical expressiveness and high-frequency acoustic details. Moreover, the adversarial acoustic model converged stably without the need to enforce reconstruction objectives, indicating the convergence stability of the proposed DDPM and WGAN combined architecture over alternative GAN-based SVS systems.

READ FULL TEXT

page 4

page 8

research
10/14/2021

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation

High-fidelity singing voice synthesis is challenging for neural vocoders...
research
05/06/2021

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

Singing voice synthesis (SVS) system is built to synthesize high-quality...
research
10/17/2021

VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis

In this paper, we propose VISinger, a complete end-to-end high-quality s...
research
03/26/2019

WGANSing: A Multi-Voice Singing Voice Synthesizer Based on the Wasserstein-GAN

We present a deep neural network based singing voice synthesizer, inspir...
research
10/18/2021

KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke

An automatic pitch correction system typically includes several stages, ...
research
10/14/2022

Hierarchical Diffusion Models for Singing Voice Neural Vocoder

Recent progress in deep generative models has improved the quality of ne...
research
04/15/2019

Singing voice synthesis based on convolutional neural networks

The present paper describes a singing voice synthesis based on convoluti...

Please sign up or login with your details

Forgot password? Click here to reset