Generative Models for Improved Naturalness, Intelligibility, and Voicing of Whispered Speech

12/04/2022
by   Dominik Wagner, et al.
0

This work adapts two recent architectures of generative models and evaluates their effectiveness for the conversion of whispered speech to normal speech. We incorporate the normal target speech into the training criterion of vector-quantized variational autoencoders (VQ-VAEs) and MelGANs, thereby conditioning the systems to recover voiced speech from whispered inputs. Objective and subjective quality measures indicate that both VQ-VAEs and MelGANs can be modified to perform the conversion task. We find that the proposed approaches significantly improve the Mel cepstral distortion (MCD) metric by at least 25 tests suggest that the MelGAN-based system significantly improves naturalness, intelligibility, and voicing compared to the whispered input speech. A novel evaluation measure based on differences between latent speech representations also indicates that our MelGAN-based approach yields improvements relative to the baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2021

Attention-Guided Generative Adversarial Network for Whisper to Normal Speech Conversion

Whispered speech is a special way of pronunciation without using vocal c...
research
10/22/2019

SoftGAN: Learning generative models efficiently with application to CycleGAN Voice Conversion

Voice conversion with deep neural networks has become extremely popular ...
research
03/03/2023

WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions

Recognizing whispered speech and converting it to normal speech creates ...
research
04/04/2017

Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks

Building a voice conversion (VC) system from non-parallel speech corpora...
research
02/18/2021

Generative Speech Coding with Predictive Variance Regularization

The recent emergence of machine-learning based generative models for spe...
research
03/04/2021

crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder

In this paper, we present an open-source software for developing a nonpa...
research
02/12/2021

Enhancing into the codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders

Audio codecs based on discretized neural autoencoders have recently been...

Please sign up or login with your details

Forgot password? Click here to reset