A comparison of handcrafted, parameterized, and learnable features for speech separation

11/29/2020
by   Wenbo Zhu, et al.
0

The design of acoustic features is important for speech separation. It can be roughly categorized into three classes: handcrafted, parameterized, and learnable features. Among them, learnable features, which are trained with separation networks jointly in an end-to-end fashion, become a new trend of modern speech separation research, e.g. convolutional time domain audio separation network (Conv-Tasnet), while handcrafted and parameterized features are also shown competitive in very recent studies. However, a systematic comparison across the three kinds of acoustic features has not been conducted yet. In this paper, we compare them in the framework of Conv-Tasnet by setting its encoder and decoder with different acoustic features. We also generalize the handcrafted multi-phase gammatone filterbank (MPGTF) to a new parameterized multi-phase gammatone filterbank (ParaMPGTF). Experimental results on the WSJ0-2mix corpus show that (i) if the decoder is learnable, then setting the encoder to STFT, MPGTF, ParaMPGTF, and learnable features lead to similar performance; and (ii) when the pseudo-inverse transforms of STFT, MPGTF, and ParaMPGTF are used as the decoders, the proposed ParaMPGTF performs better than the other two handcrafted features.

READ FULL TEXT
research
05/15/2019

End-to-End Multi-Channel Speech Separation

The end-to-end approach for single-channel speech separation has been st...
research
03/31/2022

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

In this paper, we present a novel framework that jointly performs speake...
research
11/23/2020

End-to-end Silent Speech Recognition with Acoustic Sensing

Silent speech interfaces (SSI) has been an exciting area of recent inter...
research
09/30/2022

An efficient encoder-decoder architecture with top-down attention for speech separation

Deep neural networks have shown excellent prospects in speech separation...
research
07/26/2023

Exploring the Interactions between Target Positive and Negative Information for Acoustic Echo Cancellation

Acoustic echo cancellation (AEC) aims to remove interference signals whi...
research
10/25/2019

A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet

In this work, we investigate if the learned encoder of the end-to-end co...
research
06/02/2023

HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders

This paper introduces an end-to-end neural speech restoration model, HD-...

Please sign up or login with your details

Forgot password? Click here to reset