LSTM-based Whisper Detection

09/20/2018
by   Zeynab Raeesy, et al.
0

This article presents a whisper speech detector in the far-field domain. The proposed system consists of a long-short term memory (LSTM) neural network trained on log-filterbank energy (LFBE) acoustic features. This model is trained and evaluated on recordings of human interactions with voice-controlled, far-field devices in whisper and normal phonation modes. We compare multiple inference approaches for utterance-level classification by examining trajectories of the LSTM posteriors. In addition, we engineer a set of features based on the signal characteristics inherent to whisper speech, and evaluate their effectiveness in further separating whisper from normal speech. A benchmarking of these features using multilayer perceptrons (MLP) and LSTMs suggests that the proposed features, in combination with LFBE features, can help us further improve our classifiers. We prove that, with enough data, the LSTM model is indeed as capable of learning whisper characteristics from LFBE features alone com- pared to a simpler MLP model that uses both LFBE and features engineered for separating whisper and normal speech. In addition, we prove that the LSTM classifiers accuracy can be further improved with the incorporation of the proposed engineered features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2018

Language Identification with Deep Bottleneck Features

In this paper we proposed an end-to-end short utterances speech language...
research
11/21/2017

Deep Long Short-Term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition

Far-field speech recognition in noisy and reverberant conditions remains...
research
01/02/2019

A Deep Learning Approach for Similar Languages, Varieties and Dialects

Deep learning mechanisms are prevailing approaches in recent days for th...
research
02/14/2020

A Comparison of Pooling Methods on LSTM Models for Rare Acoustic Event Classification

Acoustic event classification (AEC) and acoustic event detection (AED) r...
research
02/08/2016

LSTM Deep Neural Networks Postfiltering for Improving the Quality of Synthetic Voices

Recent developments in speech synthesis have produced systems capable of...
research
02/22/2022

Continuous Speech for Improved Learning Pathological Voice Disorders

Goal: Numerous studies had successfully differentiated normal and abnorm...
research
07/12/2019

Voice Pathology Detection Using Deep Learning: a Preliminary Study

This paper describes a preliminary investigation of Voice Pathology Dete...

Please sign up or login with your details

Forgot password? Click here to reset