Optimising The Input Window Alignment in CD-DNN Based Phoneme Recognition for Low Latency Processing

06/29/2016
by   Akash Kumar Dhaka, et al.
0

We present a systematic analysis on the performance of a phonetic recogniser when the window of input features is not symmetric with respect to the current frame. The recogniser is based on Context Dependent Deep Neural Networks (CD-DNNs) and Hidden Markov Models (HMMs). The objective is to reduce the latency of the system by reducing the number of future feature frames required to estimate the current output. Our tests performed on the TIMIT database show that the performance does not degrade when the input window is shifted up to 5 frames in the past compared to common practice (no future frame). This corresponds to improving the latency by 50 ms in our settings. Our tests also show that the best results are not obtained with the symmetric window commonly employed, but with an asymmetric window with eight past and two future context frames, although this observation should be confirmed on other data sets. The reduction in latency suggested by our results is critical for specific applications such as real-time lip synchronisation for tele-presence, but may also be beneficial in general applications to improve the lag in human-machine spoken interaction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2018

Automatic context window composition for distant speech recognition

Distant speech recognition is being revolutionized by deep learning, tha...
research
06/22/2021

Deep neural network Based Low-latency Speech Separation with Asymmetric analysis-Synthesis Window Pair

Time-frequency masking or spectrum prediction computed via short symmetr...
research
04/21/2022

STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency

Deep learning based speech enhancement in the short-term Fourier transfo...
research
04/15/2022

Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction

Frame-online speech enhancement systems in the short-time Fourier transf...
research
07/24/2023

ExWarp: Extrapolation and Warping-based Temporal Supersampling for High-frequency Displays

High-frequency displays are gaining immense popularity because of their ...
research
05/03/2018

Deep Denoising for Hearing Aid Applications

Reduction of unwanted environmental noises is an important feature of to...

Please sign up or login with your details

Forgot password? Click here to reset