Speaker Anonymization with Distribution-Preserving X-Vector Generation for the VoicePrivacy Challenge 2020

by   Henry Turner, et al.

In this paper, we present a Distribution-Preserving Voice Anonymization technique, as our submission to the VoicePrivacy Challenge 2020. We notice that the challenge baseline system generates fake X-vectors which are very similar to each other, significantly more so than those extracted from organic speakers. This difference arises from averaging many X-vectors from a pool of speakers in the anonymization processs, causing a loss of information. We propose a new method to generate fake X-vectors which overcomes these limitations by preserving the distributional properties of X-vectors and their intra-similarity. We use population data to learn the properties of the X-vector space, before fitting a generative model which we use to sample fake X-vectors. We show how this approach generates X-vectors that more closely follow the expected intra-similarity distribution of organic speaker X-vectors. Our method can be easily integrated with others as the anonymization component of the system and removes the need to distribute a pool of speakers to use during the anonymization. Our approach leads to an increase in EER of up to 16.8% in males and 8.4% in females in scenarios where enrollment and trial utterances are anonymized versus the baseline solution, demonstrating the diversity of our generated voices.


page 1

page 2

page 3

page 4


Language-independent speaker anonymization using orthogonal Householder neural network

Speaker anonymization aims to conceal a speaker's identity while preserv...

Text-Independent Speaker Verification Based on Deep Neural Networks and Segmental Dynamic Time Warping

In this paper we present a new method for text-independent speaker verif...

DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis

This paper proposes novel algorithms for speaker embedding using subject...

STEM: Unsupervised STructural EMbedding for Stance Detection

Stance detection is an important task, supporting many downstream tasks ...

MR-SVS: Singing Voice Synthesis with Multi-Reference Encoder

Multi-speaker singing voice synthesis is to generate the singing voice s...

Voice Separation with an Unknown Number of Multiple Speakers

We present a new method for separating a mixed audio sequence, in which ...

Monaural Audio Speaker Separation with Source Contrastive Estimation

We propose an algorithm to separate simultaneously speaking persons from...

Please sign up or login with your details

Forgot password? Click here to reset