A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models

06/01/2023
by   Pin-Jui Ku, et al.
0

We propose a multi-dimensional structured state space (S4) approach to speech enhancement. To better capture the spectral dependencies across the frequency axis, we focus on modifying the multi-dimensional S4 layer with whitening transformation to build new small-footprint models that also achieve good performance. We explore several S4-based deep architectures in time (T) and time-frequency (TF) domains. The 2-D S4 layer can be considered a particular convolutional layer with an infinite receptive field although it utilizes fewer parameters than a conventional convolutional layer. Evaluated on the VoiceBank-DEMAND data set, when compared with the conventional U-net model based on convolutional layers, the proposed TF-domain S4-based model is 78.6 smaller in size, yet it still achieves competitive results with a PESQ score of 3.15 with data augmentation. By increasing the model size, we can even reach a PESQ score of 3.18.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2021

Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

Deep complex U-Net structure and convolutional recurrent network (CRN) s...
research
05/06/2021

Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU

Single channel speech enhancement is a challenging task in speech commun...
research
07/28/2023

PCNN: A Lightweight Parallel Conformer Neural Network for Efficient Monaural Speech Enhancement

Convolutional neural networks (CNN) and Transformer have wildly succeede...
research
11/16/2018

Using recurrences in time and frequency within U-net architecture for speech enhancement

When designing fully-convolutional neural network, there is a trade-off ...
research
09/07/2023

Spiking Structured State Space Model for Monaural Speech Enhancement

Speech enhancement seeks to extract clean speech from noisy signals. Tra...
research
02/03/2020

Tensor-to-Vector Regression for Multi-channel Speech Enhancement based on Tensor-Train Network

We propose a tensor-to-vector regression approach to multi-channel speec...
research
02/18/2023

Multi-dimensional frequency dynamic convolution with confident mean teacher for sound event detection

Recently, convolutional neural networks (CNNs) have been widely used in ...

Please sign up or login with your details

Forgot password? Click here to reset