Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition

02/02/2021
by   Zhong Meng, et al.
0

The efficacy of external language model (LM) integration with existing end-to-end (E2E) automatic speech recognition (ASR) systems can be improved significantly using the internal language model estimation (ILME) method. In this method, the internal LM score is subtracted from the score obtained by interpolating the E2E score with the external LM score, during inference. To improve the ILME-based inference, we propose an internal LM training (ILMT) method to minimize an additional internal LM loss by updating only the E2E model components that affect the internal LM estimation. ILMT encourages the E2E model to form a standalone LM inside its existing components, without sacrificing ASR accuracy. After ILMT, the more modular E2E model with matched training and inference criteria enables a more thorough elimination of the source-domain internal LM, and therefore leads to a more effective integration of the target-domain external LM. Experimented with 30K-hour trained recurrent neural network transducer and attention-based encoder-decoder models, ILMT with ILME-based inference achieves up to 31.5 reductions from standard E2E training with Shallow Fusion on out-of-domain LibriSpeech and in-domain Microsoft production test sets, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2020

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

The external language models (LM) integration remains a challenging task...
research
01/26/2022

Internal language model estimation through explicit context vector learning for attention-based encoder-decoder ASR

An end-to-end (E2E) speech recognition model implicitly learns a biased ...
research
11/02/2022

Internal Language Model Estimation based Adaptive Language Model Fusion for Domain Adaptation

ASR model deployment environment is ever-changing, and the incoming spee...
research
04/19/2023

CB-Conformer: Contextual biasing Conformer for biased word recognition

Due to the mismatch between the source and target domains, how to better...
research
04/07/2021

Librispeech Transducer Model with Internal Language Model Prior Correction

We present our transducer model on Librispeech. We study variants to inc...
research
08/25/2023

Decoupled Structure for Improved Adaptability of End-to-End Models

Although end-to-end (E2E) trainable automatic speech recognition (ASR) h...
research
06/15/2022

Residual Language Model for End-to-end Speech Recognition

End-to-end automatic speech recognition suffers from adaptation to unkno...

Please sign up or login with your details

Forgot password? Click here to reset