High-quality nonparallel voice conversion based on cycle-consistent adversarial network

04/02/2018
by   Fuming Fang, et al.
0

Although voice conversion (VC) algorithms have achieved remarkable success along with the development of machine learning, superior performance is still difficult to achieve when using nonparallel data. In this paper, we propose using a cycle-consistent adversarial network (CycleGAN) for nonparallel data-based VC training. A CycleGAN is a generative adversarial network (GAN) originally developed for unpaired image-to-image translation. A subjective evaluation of inter-gender conversion demonstrated that the proposed method significantly outperformed a method based on the Merlin open source neural network speech synthesis system (a parallel VC system adapted for our setup) and a GAN-based parallel VC system. This is the first research to show that the performance of a nonparallel VC method can exceed that of state-of-the-art parallel VC methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2020

Many-to-Many Voice Conversion using Conditional Cycle-Consistent Adversarial Networks

Voice conversion (VC) refers to transforming the speaker characteristics...
research
08/18/2020

CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion

Recently, Generative Adversarial Networks (GAN)-based methods have shown...
research
06/10/2023

Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks

Cycle-consistent generative adversarial networks have been widely used i...
research
11/30/2017

Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks

We propose a parallel-data-free voice conversion (VC) method that can le...
research
11/26/2020

Continuous Conversion of CT Kernel using Switchable CycleGAN with AdaIN

In X-ray computed tomography (CT) reconstruction, different filter kerne...
research
08/09/2018

Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

Speaking rate refers to the average number of phonemes within some unit ...
research
02/25/2021

MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames

Non-parallel voice conversion (VC) is a technique for training voice con...

Please sign up or login with your details

Forgot password? Click here to reset