DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion

10/20/2022
by   Chihiro Watanabe, et al.
0

Voice conversion is a task to convert a non-linguistic feature of a given utterance. Since naturalness of speech strongly depends on its pitch pattern, in some applications, it would be desirable to keep the original rise/fall pitch pattern while changing the speaker identity. Some of the existing methods address this problem by either using a source-filter model or developing a neural network that takes an F0 pattern as input to the model. Although the latter approach can achieve relatively high sound quality compared to the former one, there is no consideration for discrepancy between the target and generated F0 patterns in its training process. In this paper, we propose a new variational-autoencoder-based voice conversion model accompanied by an auxiliary network, which ensures that the conversion result correctly reflects the specified F0/timbre information. We show the effectiveness of the proposed method by objective and subjective evaluations.

READ FULL TEXT
research
07/11/2021

Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder

Voice conversion is a challenging task which transforms the voice charac...
research
03/11/2019

Singing voice conversion with non-parallel data

Singing voice conversion is a task to convert a song sang by a source si...
research
08/15/2018

Investigation of Using Disentangled and Interpretable Representations for One-shot Cross-lingual Voice Conversion

We study the problem of cross-lingual voice conversion in non-parallel s...
research
06/15/2021

Pathological voice adaptation with autoencoder-based voice conversion

In this paper, we propose a new approach to pathological speech synthesi...
research
11/23/2022

Voice-preserving Zero-shot Multiple Accent Conversion

Most people who have tried to learn a foreign language would have experi...
research
08/13/2018

ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder

This paper proposes a non-parallel many-to-many voice conversion (VC) me...
research
07/26/2021

Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations

Voice conversion (VC) consists of digitally altering the voice of an ind...

Please sign up or login with your details

Forgot password? Click here to reset