Phoneme-based Distribution Regularization for Speech Enhancement

04/08/2021
by   Yajing Liu, et al.
0

Existing speech enhancement methods mainly separate speech from noises at the signal level or in the time-frequency domain. They seldom pay attention to the semantic information of a corrupted signal. In this paper, we aim to bridge this gap by extracting phoneme identities to help speech enhancement. Specifically, we propose a phoneme-based distribution regularization (PbDr) for speech enhancement, which incorporates frame-wise phoneme information into speech enhancement network in a conditional manner. As different phonemes always lead to different feature distributions in frequency, we propose to learn a parameter pair, i.e. scale and bias, through a phoneme classification vector to modulate the speech enhancement network. The modulation parameter pair includes not only frame-wise but also frequency-wise conditions, which effectively map features to phoneme-related distributions. In this way, we explicitly regularize speech enhancement features by recognition vectors. Experiments on public datasets demonstrate that the proposed PbDr module can not only boost the perceptual quality for speech enhancement but also the recognition accuracy of an ASR system on the enhanced speech. This PbDr module could be readily incorporated into other speech enhancement networks as well.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2021

Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions

The deep learning based time-domain models, e.g. Conv-TasNet, have shown...
research
11/15/2021

Time-Frequency Attention for Monaural Speech Enhancement

Most studies on speech enhancement generally don't consider the energy d...
research
02/20/2023

Real-Time Speech Enhancement Using Spectral Subtraction with Minimum Statistics and Spectral Floor

An initial real-time speech enhancement method is presented to reduce th...
research
06/16/2022

Adversarial Privacy Protection on Speech Enhancement

Speech is easily leaked imperceptibly, such as being recorded by mobile ...
research
02/16/2018

Constrained Convolutional-Recurrent Networks to Improve Speech Quality with Low Impact on Recognition Accuracy

For a speech-enhancement algorithm, it is highly desirable to simultaneo...
research
10/30/2022

SRTNet: Time Domain Speech Enhancement Via Stochastic Refinement

Diffusion model, as a new generative model which is very popular in imag...
research
06/22/2021

Learning to Inference with Early Exit in the Progressive Speech Enhancement

In real scenarios, it is often necessary and significant to control the ...

Please sign up or login with your details

Forgot password? Click here to reset