Blind Extraction of Target Speech Source Guided by Supervised Speaker Identification via X-vectors

by   Jiri Malek, et al.

This manuscript proposes a novel robust procedure for extraction of a speaker of interest (SOI) from a mixture of audio sources. The estimation of the SOI is blind, performed via independent vector extraction. A recently proposed constant separating vector (CSV) model is employed, which improves the estimation of moving sources. The blind algorithm is guided towards the SOI via the frame-wise speaker identification, which is trained in a supervised manner and is independent of a specific scenario. When processing challenging data, an incorrect speaker may be extracted due to limitations of this guidance. To identify such cases, a criterion non-intrusively assessing quality of the estimated SOI is proposed. It utilizes the same model as the speaker identification; no additional training is therefore required. Using this criterion, the “deflation” approach to extraction is presented. If an incorrect source is estimated, it is subtracted from the mixture and the extraction of the SOI is performed again from the reduced mixture. The proposed procedure is experimentally tested on both artificial and real-world datasets containing challenging phenomena: source movements, reverberation, transient noise or microphone failures. The presented method is comparable to the state-of-the-art blind algorithms on static mixtures; it is more accurate for mixtures containing source movements. Compared to fully supervised methods, the proposed procedure achieves a lower level of accuracy but requires no scenario-specific data for the training.


Adaptive blind audio source extraction supervised by dominant speaker identification using x-vectors

We propose a novel algorithm for adaptive blind audio source extraction....

Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction

Dominant researches adopt supervised training for speaker extraction, wh...

Semi-supervised Time Domain Target Speaker Extraction with Attention

In this work, we propose Exformer, a time-domain architecture for target...

Independent Vector Extraction Constrained on Manifold of Half-Length Filters

Independent Vector Analysis (IVA) is a popular extension of Independent ...

Efficient Independent Vector Extraction of Dominant Target Speech

The complete decomposition performed by blind source separation is compu...

Blind and neural network-guided convolutional beamformer for joint denoising, dereverberation, and source separation

This paper proposes an approach for optimizing a Convolutional BeamForme...

Histogram Transform-based Speaker Identification

A novel text-independent speaker identification (SI) method is proposed....

Please sign up or login with your details

Forgot password? Click here to reset