Deep context: end-to-end contextual speech recognition

08/07/2018
by   Golan Pundak, et al.
8

In automatic speech recognition (ASR) what a user says depends on the particular context she is in. Typically, this context is represented as a set of word n-grams. In this work, we present a novel, all-neural, end-to-end (E2E) ASR sys- tem that utilizes such context. Our approach, which we re- fer to as Contextual Listen, Attend and Spell (CLAS) jointly- optimizes the ASR components along with embeddings of the context n-grams. During inference, the CLAS system can be presented with context phrases which might contain out-of- vocabulary (OOV) terms not seen during training. We com- pare our proposed system to a more traditional contextualiza- tion approach, which performs shallow-fusion between inde- pendently trained LAS and contextual n-gram models during beam search. Across a number of tasks, we find that the pro- posed CLAS system outperforms the baseline method by as much as 68 indicating the advantage of joint optimization over individually trained components. Index Terms: speech recognition, sequence-to-sequence models, listen attend and spell, LAS, attention, embedded speech recognition.

READ FULL TEXT
research
10/29/2018

Contextual Speech Recognition with Difficult Negative Training Examples

Improving the representation of contextual information is key to unlocki...
research
05/26/2022

Contextual Adapters for Personalized Speech Recognition in Neural Transducers

Personal rare word recognition in end-to-end Automatic Speech Recognitio...
research
11/05/2021

Context-Aware Transformer Transducer for Speech Recognition

End-to-end (E2E) automatic speech recognition (ASR) systems often have d...
research
10/16/2019

Lead2Gold: Towards exploiting the full potential of noisy transcriptions for speech recognition

The transcriptions used to train an Automatic Speech Recognition (ASR) s...
research
06/21/2019

Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models

Contextual automatic speech recognition, i.e., biasing recognition towar...
research
06/04/2023

SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings

Contextual spelling correction models are an alternative to shallow fusi...
research
08/15/2017

Comparison of Decoding Strategies for CTC Acoustic Models

Connectionist Temporal Classification has recently attracted a lot of in...

Please sign up or login with your details

Forgot password? Click here to reset