Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators

10/27/2020
by   Ryuichi Yamamoto, et al.
0

This paper proposes voicing-aware conditional discriminators for Parallel WaveGAN-based waveform synthesis systems. In this framework, we adopt a projection-based conditioning method that can significantly improve the discriminator's performance. Furthermore, the conventional discriminator is separated into two waveform discriminators for modeling voiced and unvoiced speech. As each discriminator learns the distinctive characteristics of the harmonic and noise components, respectively, the adversarial training process becomes more efficient, allowing the generator to produce more realistic speech waveforms. Subjective test results demonstrate the superiority of the proposed method over the conventional Parallel WaveGAN and WaveNet systems. In particular, our speaker-independently trained model within a FastSpeech 2 based text-to-speech framework achieves the mean opinion scores of 4.20, 4.18, 4.21, and 4.31 for four Japanese speakers, respectively.

READ FULL TEXT
research
10/25/2019

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

We propose Parallel WaveGAN, a distillation-free, fast, and small-footpr...
research
01/19/2021

Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss

This paper proposes a spectral-domain perceptual weighting technique for...
research
07/31/2023

DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training

Expressive text-to-speech systems have undergone significant advancement...
research
09/26/2022

Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech

We propose a novel training algorithm for a multi-speaker neural text-to...
research
09/23/2017

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

A method for statistical parametric speech synthesis incorporating gener...
research
07/26/2021

Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations

In this paper, we propose an effective method to synthesize speaker-spec...
research
10/08/2019

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Previous works <cit.> have found that generating coherent raw audio wave...

Please sign up or login with your details

Forgot password? Click here to reset