Single-channel speech separation using Soft-minimum Permutation Invariant Training

11/16/2021
by   Midia Yousefi, et al.
0

The goal of speech separation is to extract multiple speech sources from a single microphone recording. Recently, with the advancement of deep learning and availability of large datasets, speech separation has been formulated as a supervised learning problem. These approaches aim to learn discriminative patterns of speech, speakers, and background noise using a supervised learning algorithm, typically a deep neural network. A long-lasting problem in supervised speech separation is finding the correct label for each separated speech signal, referred to as label permutation ambiguity. Permutation ambiguity refers to the problem of determining the output-label assignment between the separated sources and the available single-speaker speech labels. Finding the best output-label assignment is required for calculation of separation error, which is later used for updating parameters of the model. Recently, Permutation Invariant Training (PIT) has been shown to be a promising solution in handling the label ambiguity problem. However, the overconfident choice of the output-label assignment by PIT results in a sub-optimal trained model. In this work, we propose a probabilistic optimization framework to address the inefficiency of PIT in finding the best output-label assignment. Our proposed method entitled trainable Soft-minimum PIT is then employed on the same Long-Short Term Memory (LSTM) architecture used in Permutation Invariant Training (PIT) speech separation method. The results of our experiments show that the proposed method outperforms conventional PIT speech separation significantly (p-value < 0.01) by +1dB in Signal to Distortion Ratio (SDR) and +1.5dB in Signal to Interference Ratio (SIR).

READ FULL TEXT

page 1

page 2

page 5

research
08/04/2019

Probabilistic Permutation Invariant Training for Speech Separation

Single-microphone, speaker-independent speech separation is normally per...
research
10/28/2019

Interrupted and cascaded permutation invariant training for speech separation

Permutation Invariant Training (PIT) has long been a stepping stone meth...
research
02/02/2019

FurcaNet: An end-to-end deep gated convolutional, long short-term memory, deep neural networks for single channel speech separation

Deep gated convolutional networks have been proved to be very effective ...
research
10/27/2021

Separating Long-Form Speech with Group-Wise Permutation Invariant Training

Multi-talker conversational speech processing has drawn many interests f...
research
08/31/2017

Joint Separation and Denoising of Noisy Multi-talker Speech using Recurrent Neural Networks and Permutation Invariant Training

In this paper we propose to use utterance-level Permutation Invariant Tr...
research
07/19/2017

Single-Channel Multi-talker Speech Recognition with Permutation Invariant Training

Although great progresses have been made in automatic speech recognition...
research
10/20/2021

Time-Domain Mapping Based Single-Channel Speech Separation With Hierarchical Constraint Training

Single-channel speech separation is required for multi-speaker speech re...

Please sign up or login with your details

Forgot password? Click here to reset