Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word Speech Recognition

by   Shane Settle, et al.

Direct acoustics-to-word (A2W) systems for end-to-end automatic speech recognition are simpler to train, and more efficient to decode with, than sub-word systems. However, A2W systems can have difficulties at training time when data is limited, and at decoding time when recognizing words outside the training vocabulary. To address these shortcomings, we investigate the use of recently proposed acoustic and acoustically grounded word embedding techniques in A2W systems. The idea is based on treating the final pre-softmax weight matrix of an AWE recognizer as a matrix of word embedding vectors, and using an externally trained set of word embeddings to improve the quality of this matrix. In particular we introduce two ideas: (1) Enforcing similarity at training time between the external embeddings and the recognizer weights, and (2) using the word embeddings at test time for predicting out-of-vocabulary words. Our word embedding model is acoustically grounded, that is it is learned jointly with acoustic embeddings so as to encode the words' acoustic-phonetic content; and it is parametric, so that it can embed any arbitrary (potentially out-of-vocabulary) sequence of characters. We find that both techniques improve the performance of an A2W recognizer on conversational telephone speech.


page 1

page 2

page 3

page 4


Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings

Segmental models are sequence prediction models in which scores of hypot...

Neural approaches to spoken content embedding

Comparing spoken segments is a central operation to speech processing. T...

Improvements to Embedding-Matching Acoustic-to-Word ASR Using Multiple-Hypothesis Pronunciation-Based Embeddings

In embedding-matching acoustic-to-word (A2W) ASR, every word in the voca...

Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants

This paper deals with using word embedding models to trace the temporal ...

Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding

In this paper, we investigate the benefit that off-the-shelf word embedd...

Acoustic Neighbor Embeddings

This paper proposes a novel acoustic word embedding called Acoustic Neig...

The emergent algebraic structure of RNNs and embeddings in NLP

We examine the algebraic and geometric properties of a uni-directional G...

Please sign up or login with your details

Forgot password? Click here to reset