Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions

10/27/2021
by   Wangyou Zhang, et al.
0

The deep learning based time-domain models, e.g. Conv-TasNet, have shown great potential in both single-channel and multi-channel speech enhancement. However, many experiments on the time-domain speech enhancement model are done in simulated conditions, and it is not well studied whether the good performance can generalize to real-world scenarios. In this paper, we aim to provide an insightful investigation of applying multi-channel Conv-TasNet based speech enhancement to both simulation and real data. Our preliminary experiments show a large performance gap between the two conditions in terms of the ASR performance. Several approaches are applied to close this gap, including the integration of multi-channel Conv-TasNet into the beamforming model with various strategies, and the joint training of speech enhancement and speech recognition models. Our experiments on the CHiME-4 corpus show that our proposed approaches can greatly reduce the speech recognition performance discrepancy between simulation and real data, while preserving the strong speech enhancement capability in the frontend.

READ FULL TEXT
research
03/11/2019

Bridging the Gap Between Monaural Speech Enhancement and Recognition with Distortion-Independent Acoustic Modeling

Monaural speech enhancement has made dramatic advances since the introdu...
research
03/27/2018

Student-Teacher Learning for BLSTM Mask-based Speech Enhancement

Spectral mask estimation using bidirectional long short-term memory (BLS...
research
04/08/2021

Phoneme-based Distribution Regularization for Speech Enhancement

Existing speech enhancement methods mainly separate speech from noises a...
research
03/13/2023

Guided Speech Enhancement Network

High quality speech capture has been widely studied for both voice commu...
research
06/13/2018

A Study of Enhancement, Augmentation, and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition

Speech recognizers trained on close-talking speech do not generalize to ...
research
01/30/2020

Channel-Attention Dense U-Net for Multichannel Speech Enhancement

Supervised deep learning has gained significant attention for speech enh...
research
02/15/2018

Deep Learning Based Speech Beamforming

Multi-channel speech enhancement with ad-hoc sensors has been a challeng...

Please sign up or login with your details

Forgot password? Click here to reset