On The Compensation Between Magnitude and Phase in Speech Separation

08/11/2021
by   Zhong-Qiu Wang, et al.
0

Deep neural network (DNN) based end-to-end optimization in the complex time-frequency (T-F) domain or time domain has shown considerable potential in monaural speech separation. Many recent studies optimize loss functions defined solely in the time or complex domain, without including a loss on magnitude. Although such loss functions typically produce better scores if the evaluation metrics are objective time-domain metrics, they however produce worse scores on speech quality and intelligibility metrics and usually lead to worse speech recognition performance, compared with including a loss on magnitude. While this phenomenon has been experimentally observed by many studies, it is often not accurately explained and there lacks a thorough understanding on its fundamental cause. This paper provides a novel view from the perspective of the implicit compensation between estimated magnitude and phase. Analytical results based on monaural speech separation and robust automatic speech recognition (ASR) tasks in noisy-reverberant conditions support the validity of our view.

READ FULL TEXT
research
09/21/2015

Noise Robust IOA/CAS Speech Separation and Recognition System For The Third 'CHIME' Challenge

This paper presents the contribution to the third 'CHiME' speech separat...
research
12/18/2019

End-to-end training of time domain audio separation and recognition

The rising interest in single-channel multi-speaker speech separation sp...
research
07/11/2019

Multichannel Loss Function for Supervised Speech Source Separation by Mask-based Beamforming

In this paper, we propose two mask-based beamforming methods using a dee...
research
03/08/2019

A Deep Generative Model of Speech Complex Spectrograms

This paper proposes an approach to the joint modeling of the short-time ...
research
10/02/2020

Differentiable Weighted Finite-State Transducers

We introduce a framework for automatic differentiation with weighted fin...
research
10/22/2018

Investigation of Monaural Front-End Processing for Robust ASR without Retraining or Joint-Training

In recent years, monaural speech separation has been formulated as a sup...
research
06/06/2023

Phase perturbation improves channel robustness for speech spoofing countermeasures

In this paper, we aim to address the problem of channel robustness in sp...

Please sign up or login with your details

Forgot password? Click here to reset