Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions

03/28/2022
by   Xiaoxiao Miao, et al.
0

In our previous work, we proposed a language-independent speaker anonymization system based on self-supervised learning models. Although the system can anonymize speech data of any language, the anonymization was imperfect, and the speech content of the anonymized speech was distorted. This limitation is more severe when the input speech is from a domain unseen in the training data. This study analyzed the bottleneck of the anonymization system under unseen conditions. It was found that the domain (e.g., language and channel) mismatch between the training and test data affected the neural waveform vocoder and anonymized speaker vectors, which limited the performance of the whole system. Increasing the training data diversity for the vocoder was found to be helpful to reduce its implicit language and channel dependency. Furthermore, a simple correlation-alignment-based domain adaption strategy was found to be significantly effective to alleviate the mismatch on the anonymized speaker vectors. Audio samples and source code are available online.

READ FULL TEXT
research
03/30/2022

Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks

Because the performance of speech separation is excellent for speech in ...
research
05/30/2023

Language-independent speaker anonymization using orthogonal Householder neural network

Speaker anonymization aims to conceal a speaker's identity while preserv...
research
09/14/2023

M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec

We introduce M3-AUDIODEC, an innovative neural spatial audio codec desig...
research
03/02/2020

Pathological speech detection using x-vector embeddings

The potential of speech as a non-invasive biomarker to assess a speaker'...
research
10/24/2016

UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation

This document briefly describes the systems submitted by the Center for ...
research
11/12/2022

A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units

We present a unified system to realize one-shot voice conversion (VC) on...
research
10/25/2019

Learning Domain Invariant Representations for Child-Adult Classification from Speech

Diagnostic procedures for ASD (autism spectrum disorder) involve semi-na...

Please sign up or login with your details

Forgot password? Click here to reset