Speakerfilter-Pro: an improved target speaker extractor combines the time domain and frequency domain

10/25/2020
by   Shulin He, et al.
10

This paper introduces an improved target speaker extractor, referred to as Speakerfilter-Pro, based on our previous Speakerfilter model. The Speakerfilter uses a bi-direction gated recurrent unit (BGRU) module to characterize the target speaker from anchor speech and use a convolutional recurrent network (CRN) module to separate the target speech from a noisy signal.Different from the Speakerfilter, the Speakerfilter-Pro sticks a WaveUNet module in the beginning and the ending, respectively. The WaveUNet has been proven to have a better ability to perform speech separation in the time domain. In order to extract the target speaker information better, the complex spectrum instead of the magnitude spectrum is utilized as the input feature for the CRN module. Experiments are conducted on the two-speaker dataset (WSJ0-mix2) which is widely used for speaker extraction. The systematic evaluation shows that the Speakerfilter-Pro outperforms the Speakerfilter and other baselines, and achieves a signal-to-distortion ratio (SDR) of 14.95 dB.

READ FULL TEXT
research
04/29/2020

Time-domain speaker extraction network

Speaker extraction is to extract a target speaker's voice from multi-tal...
research
05/10/2020

SpEx+: A Complete Time Domain Speaker Extraction Network

Speaker extraction aims to extract the target speech signal from a multi...
research
04/17/2020

SpEx: Multi-Scale Time Domain Speaker Extraction Network

Speaker extraction aims to mimic humans' selective auditory attention by...
research
06/28/2023

Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information

Previously, Target Speaker Extraction (TSE) has yielded outstanding perf...
research
06/28/2021

Sparsely Overlapped Speech Training in the Time Domain: Joint Learning of Target Speech Separation and Personal VAD Benefits

Target speech separation is the process of filtering a certain speaker's...
research
03/31/2022

A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction

Speaker extraction algorithm extracts the target speech from a mixture s...
research
07/24/2018

Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures

Speaker-aware source separation methods are promising workarounds for ma...

Please sign up or login with your details

Forgot password? Click here to reset