Effects of Lombard Reflex on the Performance of Deep-Learning-Based Audio-Visual Speech Enhancement Systems

11/15/2018
by   Daniel Michelsanti, et al.
0

Humans tend to change their way of speaking when they are immersed in a noisy environment, a reflex known as Lombard effect. Current speech enhancement systems based on deep learning do not usually take into account this change in the speaking style, because they are trained with neutral (non-Lombard) speech utterances recorded under quiet conditions to which noise is artificially added. In this paper, we investigate the effects that the Lombard reflex has on the performance of audio-visual speech enhancement systems based on deep learning. The results show that a gap in the performance of as much as approximately 5 dB between the systems trained on neutral speech and the ones trained on Lombard speech exists. This indicates the benefit of taking into account the mismatch between neutral and Lombard speech in the design of audio-visual speech enhancement systems.

READ FULL TEXT
research
05/29/2019

Deep-Learning-Based Audio-Visual Speech Enhancement in Presence of Lombard Effect

When speaking in presence of background noise, humans reflexively change...
research
01/26/2022

A two-step backward compatible fullband speech enhancement system

Speech enhancement methods based on deep learning have surpassed traditi...
research
09/12/2023

Assessing the Generalization Gap of Learning-Based Speech Enhancement Systems in Noisy and Reverberant Environments

The acoustic variability of noisy and reverberant speech mixtures is inf...
research
06/04/2021

A Database for Research on Detection and Enhancement of Speech Transmitted over HF links

In this paper we present an open database for the development of detecti...
research
02/19/2021

Speech enhancement with weakly labelled data from AudioSet

Speech enhancement is a task to improve the intelligibility and perceptu...
research
08/30/2020

Improved Lite Audio-Visual Speech Enhancement

Numerous studies have investigated the effectiveness of audio-visual mul...
research
09/13/2018

Real-Time Lightweight Chaotic Encryption for 5G IoT Enabled Lip-Reading Driven Secure Hearing-Aid

Existing audio-only hearing-aids are known to perform poorly in noisy si...

Please sign up or login with your details

Forgot password? Click here to reset