Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

10/25/2019
by   Ryuichi Yamamoto, et al.
0

We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network. In the proposed method, a non-autoregressive WaveNet is trained by jointly optimizing multi-resolution spectrogram and adversarial loss functions, which can effectively capture the time-frequency distribution of the realistic speech waveform. As our method does not require density distillation used in the conventional teacher-student framework, the entire model can be easily trained even with a small number of parameters. In particular, the proposed Parallel WaveGAN has only 1.44 M parameters and can generate 24 kHz speech waveform 28.68 times faster than real-time on a single GPU environment. Perceptual listening test results verify that our proposed method achieves 4.16 mean opinion score within a Transformer-based text-to-speech framework, which is comparative to the best distillation-based Parallel WaveNet system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/09/2019

Probability density distillation with generative adversarial networks for high-quality parallel waveform generation

This paper proposes an effective probability density distillation (PDD) ...
research
01/19/2021

Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss

This paper proposes a spectral-domain perceptual weighting technique for...
research
10/27/2020

Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminators

This paper proposes voicing-aware conditional discriminators for Paralle...
research
11/13/2022

Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing

Most state-of-the-art Text-to-Speech systems use the mel-spectrogram as ...
research
04/03/2018

Speech waveform synthesis from MFCC sequences with generative adversarial networks

This paper proposes a method for generating speech from filterbank mel f...
research
05/21/2019

Parallel Neural Text-to-Speech

In this work, we propose a non-autoregressive seq2seq model that convert...
research
08/03/2020

A Spectral Energy Distance for Parallel Speech Synthesis

Speech synthesis is an important practical generative modeling problem t...

Please sign up or login with your details

Forgot password? Click here to reset