Leveraging Heteroscedastic Uncertainty in Learning Complex Spectral Mapping for Single-channel Speech Enhancement

11/16/2022
by   Kuan-Lin Chen, et al.
0

Most speech enhancement (SE) models learn a point estimate, and do not make use of uncertainty estimation in the learning process. In this paper, we show that modeling heteroscedastic uncertainty by minimizing a multivariate Gaussian negative log-likelihood (NLL) improves SE performance at no extra cost. During training, our approach augments a model learning complex spectral mapping with a temporary submodel to predict the covariance of the enhancement error at each time-frequency bin. Due to unrestricted heteroscedastic uncertainty, the covariance introduces an undersampling effect, detrimental to SE performance. To mitigate undersampling, our approach inflates the uncertainty lower bound and weights each loss component with their uncertainty, effectively compensating severely undersampled components with more penalties. Our multivariate setting reveals common covariance assumptions such as scalar and diagonal matrices. By weakening these assumptions, we show that the NLL achieves superior performance compared to popular losses including the mean squared error (MSE), mean absolute error (MAE), and scale-invariant signal-to-distortion ratio (SI-SDR).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2019

A Perceptual Weighting Filter Loss for DNN Training in Speech Enhancement

Single-channel speech enhancement with deep neural networks (DNNs) has s...
research
02/11/2023

Local spectral attention for full-band speech enhancement

Attention mechanism has been widely utilized in speech enhancement (SE) ...
research
02/24/2022

Towards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 Challenge

This paper describes our submission to the L3DAS22 Challenge Task 1, whi...
research
03/04/2022

Integrating Statistical Uncertainty into Neural Network-Based Speech Enhancement

Speech enhancement in the time-frequency domain is often performed by es...
research
12/02/2022

Injecting Spatial Information for Monaural Speech Enhancement via Knowledge Distillation

Monaural speech enhancement (SE) provides a versatile and cost-effective...
research
08/02/2018

Dirichlet Mixture Model based VQ Performance Prediction for Line Spectral Frequency

In this paper, we continue our previous work on the Dirichlet mixture mo...

Please sign up or login with your details

Forgot password? Click here to reset