Speech Enhancement for Wake-Up-Word detection in Voice Assistants

01/29/2021
by   David Bonet, et al.
2

Keyword spotting and in particular Wake-Up-Word (WUW) detection is a very important task for voice assistants. A very common issue of voice assistants is that they get easily activated by background noise like music, TV or background speech that accidentally triggers the device. In this paper, we propose a Speech Enhancement (SE) model adapted to the task of WUW detection that aims at increasing the recognition rate and reducing the false alarms in the presence of these types of noises. The SE model is a fully-convolutional denoising auto-encoder at waveform level and is trained using a log-Mel Spectrogram and waveform reconstruction losses together with the BCE loss of a simple WUW classification network. A new database has been purposely prepared for the task of recognizing the WUW in challenging conditions containing negative samples that are very phonetically similar to the keyword. The database is extended with public databases and an exhaustive data augmentation to simulate different noises and environments. The results obtained by concatenating the SE with a simple and state-of-the-art WUW detectors show that the SE does not have a negative impact on the recognition rate in quiet environments while increasing the performance in the presence of noise, especially when the SE and WUW detector are trained jointly end-to-end.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/26/2019

Multichannel Speech Enhancement by Raw Waveform-mapping using Fully Convolutional Networks

In recent years, waveform-mapping-based speech enhancement (SE) methods ...
research
10/19/2021

Speech Enhancement-assisted Stargan Voice Conversion in Noisy Environments

Numerous voice conversion (VC) techniques have been proposed for the con...
research
02/20/2023

Improving Speech Enhancement via Event-based Query

Existing deep learning based speech enhancement (SE) methods either use ...
research
08/01/2018

Data Augmentation for Robust Keyword Spotting under Playback Interference

Accurate on-device keyword spotting (KWS) with low false accept and fals...
research
09/03/2023

Noise robust speech emotion recognition with signal-to-noise ratio adapting speech enhancement

Speech emotion recognition (SER) often experiences reduced performance d...
research
05/26/2021

Training Speech Enhancement Systems with Noisy Speech Datasets

Recently, deep neural network (DNN)-based speech enhancement (SE) system...

Please sign up or login with your details

Forgot password? Click here to reset