Unsupervised Cross-Domain Speech-to-Speech Conversion with Time-Frequency Consistency

05/15/2020
by   Mohammad Asif Khan, et al.
0

In recent years generative adversarial network (GAN) based models have been successfully applied for unsupervised speech-to-speech conversion.The rich compact harmonic view of the magnitude representation is considered a suitable choice for training these models with audio data. To reconstruct the speech signal first a magnitude spectrogram is generated by the neural network, which is then utilized by methods like the Griffin-Lim algorithm to reconstruct a phase spectrogram. This procedure bears the problem that the generated magnitude spectrogram may not be consistent, which is required for finding a phase such that the full spectrogram has a natural-sounding speech waveform. In this work, we approach this problem by proposing a condition encouraging spectrogram consistency during the adversarial training procedure. We demonstrate our approach on the task of translating the voice of a male speaker to that of a female speaker, and vice versa. Our experimental results on the Librispeech corpus show that the model trained with the TF consistency provides a perceptually better quality of speech-to-speech conversion.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2021

An Adaptive Learning based Generative Adversarial Network for One-To-One Voice Conversion

Voice Conversion (VC) emerged as a significant domain of research in the...
research
04/08/2019

Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation

We describe Parrotron, an end-to-end-trained speech-to-speech conversion...
research
04/06/2018

Generative adversarial network-based approach to signal reconstruction from magnitude spectrograms

In this paper, we address the problem of reconstructing a time-domain si...
research
08/06/2020

Unsupervised Cross-Domain Singing Voice Conversion

We present a wav-to-wav generative model for the task of singing voice c...
research
08/31/2018

Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks

Most methods of voice restoration for patients suffering from aphonia ei...
research
12/06/2018

Generative Adversarial Network based Speaker Adaptation for High Fidelity WaveNet Vocoder

Neural networks based vocoders, typically the WaveNet, have achieved spe...
research
08/09/2018

Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

Speaking rate refers to the average number of phonemes within some unit ...

Please sign up or login with your details

Forgot password? Click here to reset