VoicePrivacy 2022 System Description: Speaker Anonymization with Feature-matched F0 Trajectories

10/31/2022
by   Ünal Ege Gaznepoglu, et al.
0

We introduce a novel method to improve the performance of the VoicePrivacy Challenge 2022 baseline B1 variants. Among the known deficiencies of x-vector-based anonymization systems is the insufficient disentangling of the input features. In particular, the fundamental frequency (F0) trajectories, which are used for voice synthesis without any modifications. Especially in cross-gender conversion, this situation causes unnatural sounding voices, increases word error rates (WERs), and personal information leakage. Our submission overcomes this problem by synthesizing an F0 trajectory, which better harmonizes with the anonymized x-vector. We utilized a low-complexity deep neural network to estimate an appropriate F0 value per frame, using the linguistic content from the bottleneck features (BN) and the anonymized x-vector. Our approach results in a significantly improved anonymization system and increased naturalness of the synthesized voice. Consequently, our results suggest that F0 extraction is not required for voice anonymization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2022

AVQVC: One-shot Voice Conversion by Vector Quantization with applying contrastive learning

Voice Conversion(VC) refers to changing the timbre of a speech while ret...
research
08/05/2020

Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning

This paper presents an adversarial learning method for recognition-synth...
research
08/19/2023

Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion

Singing technique conversion (STC) refers to the task of converting from...
research
11/06/2021

SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System for Both Human Beings and Machines

Nowadays, as more and more systems achieve good performance in tradition...
research
10/31/2020

AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization

Recently, voice conversion (VC) has been widely studied. Many VC systems...
research
07/26/2021

Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations

Voice conversion (VC) consists of digitally altering the voice of an ind...
research
03/26/2019

WGANSing: A Multi-Voice Singing Voice Synthesizer Based on the Wasserstein-GAN

We present a deep neural network based singing voice synthesizer, inspir...

Please sign up or login with your details

Forgot password? Click here to reset