Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems

03/02/2022
by   Xiaoqiang Wang, et al.
3

Contextual biasing is an important and challenging task for end-to-end automatic speech recognition (ASR) systems, which aims to achieve better recognition performance by biasing the ASR system to particular context phrases such as person names, music list, proper nouns, etc. Existing methods mainly include contextual LM biasing and adding bias encoder into end-to-end ASR models. In this work, we introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system. We incorporate contextual information into a sequence-to-sequence spelling correction model with a shared context encoder. Our proposed model includes two different mechanisms: autoregressive (AR) and non-autoregressive (NAR). We propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model. We demonstrate the proposed model is a general biasing solution which is domain-insensitive and can be adopted in different scenarios. Experiments show that the proposed method achieves as much as 51 reduction over ASR system and outperforms traditional biasing methods. Compared to the AR solution, the proposed NAR model reduces model size by 43.2 speeds up inference by 2.1 times.

READ FULL TEXT
research
08/17/2021

A Light-weight contextual spelling correction model for customizing transducer-based speech recognition systems

It's challenging to customize transducer-based automatic speech recognit...
research
08/07/2023

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability

Hotword customization is one of the important issues remained in ASR fie...
research
02/22/2023

Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation

We previously proposed contextual spelling correction (CSC) to correct t...
research
06/04/2023

SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings

Contextual spelling correction models are an alternative to shallow fusi...
research
01/17/2023

Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer

It is difficult for an end-to-end (E2E) ASR system to recognize words su...
research
02/18/2022

End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system

End-to-end (E2E) speech recognition architectures assemble all component...
research
05/10/2023

Quran Recitation Recognition using End-to-End Deep Learning

The Quran is the holy scripture of Islam, and its recitation is an impor...

Please sign up or login with your details

Forgot password? Click here to reset