Narrow-band Deep Filtering for Multichannel Speech Enhancement

11/25/2019
by   Xiaofei Li, et al.
0

In this paper we address the problem of multichannel speech enhancement in the short-time Fourier transform (STFT) domain and in the framework of sequence-to-sequence deep learning. A long short-time memory (LSTM) network takes as input a sequence of STFT coefficients associated with a frequency bin of multichannel noisy-speech signals. The network's output is a sequence of single-channel cleaned speech at the same frequency bin. We propose several clean-speech network targets, namely, the magnitude ratio mask, the complex ideal ratio mask, the STFT coefficients and spatial filtering. A prominent feature of the proposed model is that the same LSTM architecture, with identical parameters, is trained across frequency bins. The proposed method is referred to as narrow-band deep filtering. This choice stays in contrast with traditional wide-band speech enhancement methods. The proposed deep filter is able to discriminate between speech and noise by exploiting their different temporal and spatial characteristics: speech is non-stationary and spatially coherent while noise is relatively stationary and weakly correlated across channels. This is similar in spirit with unsupervised techniques, such as spectral subtraction and beamforming. We describe extensive experiments with both mixed signals (noise is added to clean speech) and real signals (live recordings). We empirically evaluate the proposed architecture variants using speech enhancement and speech recognition metrics, and we compare our results with the results obtained with several state of the art methods. In the light of these experiments we conclude that narrow-band deep filtering has very good performance, and excellent generalization capabilities in terms of speaker variability and noise type.

READ FULL TEXT

page 1

page 6

page 8

research
04/10/2019

Audio-noise Power Spectral Density Estimation Using Long Short-term Memory

We propose a method using a long short-term memory (LSTM) network to est...
research
05/11/2020

Online Monaural Speech Enhancement Using Delayed Subband LSTM

This paper proposes a delayed subband LSTM network for online monaural (...
research
11/16/2022

McNet: Fuse Multiple Cues for Multichannel Speech Enhancement

In multichannel speech enhancement, both spectral and spatial informatio...
research
04/05/2022

On the Relevance of Bandwidth Extension for Speaker Verification

In this paper, we consider the effect of a bandwidth extension of narrow...
research
12/20/2018

Multichannel Online Dereverberation based on Spectral Magnitude Inverse Filtering

This paper addresses the problem of multichannel online dereverberation....
research
01/03/2019

Deep Speech Enhancement for Reverberated and Noisy Signals using Wide Residual Networks

This paper proposes a deep speech enhancement method which exploits the ...
research
07/31/2023

SpatialNet: Extensively Learning Spatial Information for Multichannel Joint Speech Separation, Denoising and Dereverberation

This work proposes a neural network to extensively exploit spatial infor...

Please sign up or login with your details

Forgot password? Click here to reset