A spelling correction model for end-to-end speech recognition

02/19/2019
by   Jinxi Guo, et al.
0

Attention-based sequence-to-sequence models for speech recognition jointly train an acoustic model, language model (LM), and alignment mechanism using a single neural network and require only parallel audio-text pairs. Thus, the language model component of the end-to-end model is only trained on transcribed audio-text pairs, which leads to performance degradation especially on rare words. While there have been a variety of work that look at incorporating an external LM trained on text-only data into the end-to-end framework, none of them have taken into account the characteristic error distribution made by the model. In this paper, we propose a novel approach to utilizing text-only data, by training a spelling correction (SC) model to explicitly correct those errors. On the LibriSpeech dataset, we demonstrate that the proposed model results in an 18.6 directly correcting top ASR hypothesis, and a 29.0 further rescoring an expanded n-best list using an external LM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2017

An analysis of incorporating an external language model into a sequence-to-sequence model

Attention-based sequence-to-sequence models for automatic speech recogni...
research
03/20/2023

On-the-fly Text Retrieval for End-to-End ASR Adaptation

End-to-end speech recognition models are improved by incorporating exter...
research
07/12/2022

End-to-end speech recognition modeling from de-identified data

De-identification of data used for automatic speech recognition modeling...
research
01/06/2018

Visual Text Correction

This paper tackles the Text Correction (TC) problem, i.e., finding and r...
research
12/06/2019

Audio-attention discriminative language model for ASR rescoring

End-to-end approaches for automatic speech recognition (ASR) benefit fro...
research
12/05/2017

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model

Sequence-to-sequence models provide a simple and elegant solution for bu...
research
06/15/2022

Text-Aware End-to-end Mispronunciation Detection and Diagnosis

Mispronunciation detection and diagnosis (MDD) technology is a key compo...

Please sign up or login with your details

Forgot password? Click here to reset