Acoustic Landmarks Contain More Information About the Phone String than Other Frames for Automatic Speech Recognition with Deep Neural Network Acoustic Model

10/27/2017
by   Di He, et al.
0

Most mainstream Automatic Speech Recognition (ASR) systems consider all feature frames equally important. However, acoustic landmark theory is based on a contradictory idea, that some frames are more important than others. Acoustic landmark theory exploits quantal non-linearities in the articulatory-acoustic and acoustic-perceptual relations to define landmark times at which the speech spectrum abruptly changes or reaches an extremum; frames overlapping landmarks have been demonstrated to be sufficient for speech perception. In this work, we conduct experiments on the TIMIT corpus, with both GMM and DNN based ASR systems and find that frames containing landmarks are more informative for ASR than others. We find that altering the level of emphasis on landmarks by re-weighting acoustic likelihood tends to reduce the phone error rate (PER). Furthermore, by leveraging the landmark as a heuristic, one of our hybrid DNN frame dropping strategies maintained a PER within 0.44 less than half (45.8 out-performs other non-heuristic-based methods and demonstrate the potential of landmarks for reducing computation.

READ FULL TEXT

page 6

page 22

research
10/27/2017

Acoustic Landmarks Contain More Information About the Phone String than Other Frames

Most mainstream Automatic Speech Recognition (ASR) systems consider all ...
research
11/05/2018

When CTC Training Meets Acoustic Landmarks

Connectionist temporal classification (CTC) training criterion provides ...
research
05/15/2018

Improved ASR for Under-Resourced Languages Through Multi-Task Learning with Acoustic Landmarks

Furui first demonstrated that the identity of both consonant and vowel c...
research
06/14/2016

Calibration of Phone Likelihoods in Automatic Speech Recognition

In this paper we study the probabilistic properties of the posteriors in...
research
05/05/2021

Accent Recognition with Hybrid Phonetic Features

The performance of voice-controlled systems is usually influenced by acc...
research
11/21/2018

Speech recognition with quaternion neural networks

Neural network architectures are at the core of powerful automatic speec...
research
07/10/2019

Acoustic Model Optimization Based On Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition

Evolutionary stochastic gradient descent (ESGD) was proposed as a popula...

Please sign up or login with your details

Forgot password? Click here to reset