Student-Teacher Learning for BLSTM Mask-based Speech Enhancement

Spectral mask estimation using bidirectional long short-term memory (BLSTM) neural networks has been widely used in various speech enhancement applications, and it has achieved great success when it is applied to multichannel enhancement techniques with a mask-based beamformer. However, when these masks are used for single channel speech enhancement they severely distort the speech signal and make them unsuitable for speech recognition. This paper proposes a student-teacher learning paradigm for single channel speech enhancement. The beamformed signal from multichannel enhancement is given as input to the teacher network to obtain soft masks. An additional cross-entropy loss term with the soft mask target is combined with the original loss, so that the student network with single-channel input is trained to mimic the soft mask obtained with multichannel input through beamforming. Experiments with the CHiME-4 challenge single channel track data shows improvement in ASR performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2021

Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions

The deep learning based time-domain models, e.g. Conv-TasNet, have shown...
research
02/13/2020

DNN-Based Distributed Multichannel Mask Estimation for Speech Enhancement in Microphone Arrays

Multichannel processing is widely used for speech enhancement but severa...
research
03/27/2018

Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline

This paper describes a new baseline system for automatic speech recognit...
research
11/06/2018

Kernel Machines Beat Deep Neural Networks on Mask-based Single-channel Speech Enhancement

We apply a fast kernel method for mask-based single-channel speech enhan...
research
05/28/2021

Phoneme-Based Ratio Mask Estimation for Reverberant Speech Enhancement in Cochlear Implant Processors

Cochlear implant (CI) users have considerable difficulty in understandin...
research
09/19/2018

New insights on the optimality of parameterized wiener filters for speech enhancement applications

This work presents a unified framework for defining a family of noise re...
research
09/21/2023

A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network Speech Enhancement

Neural network approaches to single-channel speech enhancement have rece...

Please sign up or login with your details

Forgot password? Click here to reset