Dense CNN with Self-Attention for Time-Domain Speech Enhancement

by   Ashutosh Pandey, et al.

Speech enhancement in the time domain is becoming increasingly popular in recent years, due to its capability to jointly enhance both the magnitude and the phase of speech. In this work, we propose a dense convolutional network (DCN) with self-attention for speech enhancement in the time domain. DCN is an encoder and decoder based architecture with skip connections. Each layer in the encoder and the decoder comprises a dense block and an attention module. Dense blocks and attention modules help in feature extraction using a combination of feature reuse, increased network depth, and maximum context aggregation. Furthermore, we reveal previously unknown problems with a loss based on the spectral magnitude of enhanced speech. To alleviate these problems, we propose a novel loss based on magnitudes of enhanced speech and a predicted noise. Even though the proposed loss is based on magnitudes only, a constraint imposed by noise prediction ensures that the loss enhances both magnitude and phase. Experimental results demonstrate that DCN trained with the proposed loss substantially outperforms other state-of-the-art approaches to causal and non-causal speech enhancement.


page 1

page 3

page 6

page 9


Uformer: A Unet based dilated complex real dual-path conformer network for simultaneous speech enhancement and dereverberation

Complex spectrum and magnitude are considered as two major features of s...

Real-time Streaming Wave-U-Net with Temporal Convolutions for Multichannel Speech Enhancement

In this paper, we describe the work that we have done to participate in ...

Deep Residual-Dense Lattice Network for Speech Enhancement

Convolutional neural networks (CNNs) with residual links (ResNets) and c...

Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive Feature Learning in Speech Enhancement

Current speech enhancement (SE) research has largely neglected channel a...

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

Phase information has a significant impact on speech perceptual quality ...

Speech Denoising in the Waveform Domain with Self-Attention

In this work, we present CleanUNet, a causal speech denoising model on t...

Real Time Speech Enhancement in the Waveform Domain

We present a causal speech enhancement model working on the raw waveform...

Please sign up or login with your details

Forgot password? Click here to reset