Probabilistic Permutation Invariant Training for Speech Separation

08/04/2019
by   Midia Yousefi, et al.
0

Single-microphone, speaker-independent speech separation is normally performed through two steps: (i) separating the specific speech sources, and (ii) determining the best output-label assignment to find the separation error. The second step is the main obstacle in training neural networks for speech separation. Recently proposed Permutation Invariant Training (PIT) addresses this problem by determining the output-label assignment which minimizes the separation error. In this study, we show that a major drawback of this technique is the overconfident choice of the output-label assignment, especially in the initial steps of training when the network generates unreliable outputs. To solve this problem, we propose Probabilistic PIT (Prob-PIT) which considers the output-label permutation as a discrete latent random variable with a uniform prior distribution. Prob-PIT defines a log-likelihood function based on the prior distributions and the separation errors of all permutations; it trains the speech separation networks by maximizing the log-likelihood function. Prob-PIT can be easily implemented by replacing the minimum function of PIT with a soft-minimum function. We evaluate our approach for speech separation on both TIMIT and CHiME datasets. The results show that the proposed method significantly outperforms PIT in terms of Signal to Distortion Ratio and Signal to Interference Ratio.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2021

Single-channel speech separation using Soft-minimum Permutation Invariant Training

The goal of speech separation is to extract multiple speech sources from...
research
07/01/2016

Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation

We propose a novel deep learning model, which supports permutation invar...
research
10/28/2019

Interrupted and cascaded permutation invariant training for speech separation

Permutation Invariant Training (PIT) has long been a stepping stone meth...
research
06/17/2022

Conditional Permutation Invariant Flows

We present a novel, conditional generative probabilistic model of set-va...
research
10/27/2021

Separating Long-Form Speech with Group-Wise Permutation Invariant Training

Multi-talker conversational speech processing has drawn many interests f...
research
10/20/2021

Time-Domain Mapping Based Single-Channel Speech Separation With Hierarchical Constraint Training

Single-channel speech separation is required for multi-speaker speech re...
research
04/14/2023

On Data Sampling Strategies for Training Neural Network Speech Separation Models

Speech separation remains an important area of multi-speaker signal proc...

Please sign up or login with your details

Forgot password? Click here to reset