Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

10/08/2020
by   Hieu-Thi Luong, et al.
0

As the recently proposed voice cloning system, NAUTILUS, is capable of cloning unseen voices using untranscribed speech, we investigate the feasibility of using it to develop a unified cross-lingual TTS/VC system. Cross-lingual speech generation is the scenario in which speech utterances are generated with the voices of target speakers in a language not spoken by them originally. This type of system is not simply cloning the voice of the target speaker, but essentially creating a new voice that can be considered better than the original under a specific framing. By using a well-trained English latent linguistic embedding to create a cross-lingual TTS and VC system for several German, Finnish, and Mandarin speakers included in the Voice Conversion Challenge 2020, we show that our method not only creates cross-lingual VC with high speaker similarity but also can be seamlessly used for cross-lingual TTS without having to perform any extra steps. However, the subjective evaluations of perceived naturalness seemed to vary between target speakers, which is one aspect for future improvement.

READ FULL TEXT

page 2

page 4

research
10/31/2022

Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation

This paper presents a method for end-to-end cross-lingual text-to-speech...
research
10/14/2021

Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data

Recently, sequence-to-sequence (seq-to-seq) models have been successfull...
research
10/29/2019

a novel cross-lingual voice cloning approach with a few text-free samples

In this paper, we present a cross-lingual voice cloning approach. BN fea...
research
09/15/2023

Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech

In this work, we introduce a framework for cross-lingual speech synthesi...
research
12/28/2020

Building Multi lingual TTS using Cross Lingual Voice Conversion

In this paper we propose a new cross-lingual Voice Conversion (VC) appro...
research
02/03/2021

Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram

Cross-lingual voice conversion (VC) is an important and challenging prob...
research
08/11/2020

Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN

Cross-lingual voice conversion aims to change source speaker's voice to ...

Please sign up or login with your details

Forgot password? Click here to reset