Thutmose Tagger: Single-pass neural model for Inverse Text Normalization

by   Alexandra Antonova, et al.

Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition (ASR). It converts numbers, dates, abbreviations, and other semiotic classes from the spoken form generated by ASR to their written forms. One can consider ITN as a Machine Translation task and use neural sequence-to-sequence models to solve it. Unfortunately, such neural models are prone to hallucinations that could lead to unacceptable errors. To mitigate this issue, we propose a single-pass token classifier model that regards ITN as a tagging task. The model assigns a replacement fragment to every input token or marks it for deletion or copying without changes. We present a dataset preparation method based on the granular alignment of ITN examples. The proposed model is less prone to hallucination errors. The model is trained on the Google Text Normalization dataset and achieves state-of-the-art sentence accuracy on both English and Russian test sets. One-to-one correspondence between tags and input words improves the interpretability of the model's predictions, simplifies debugging, and allows for post-processing corrections. The model is simpler than sequence-to-sequence models and easier to optimize in production settings. The model and the code to prepare the dataset is published as part of NeMo project.


page 1

page 2

page 3

page 4


Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition

Features such as punctuation, capitalization, and formatting of entities...

Sequence-to-Sequence Models for Extracting Information from Registration and Legal Documents

A typical information extraction pipeline consists of token- or span-lev...

DeepNorm-A Deep Learning Approach to Text Normalization

This paper presents an simple yet sophisticated approach to the challeng...

Improving Robustness of Neural Inverse Text Normalization via Data-Augmentation, Semi-Supervised Learning, and Post-Aligning Method

Inverse text normalization (ITN) is crucial for converting spoken-form i...

Upcycle Your OCR: Reusing OCRs for Post-OCR Text Correction in Romanised Sanskrit

We propose a post-OCR text correction approach for digitising texts in R...

Mitigating the Impact of Speech Recognition Errors on Chatbot using Sequence-to-Sequence Model

We apply sequence-to-sequence model to mitigate the impact of speech rec...

Incorporating External POS Tagger for Punctuation Restoration

Punctuation restoration is an important post-processing step in automati...

Please sign up or login with your details

Forgot password? Click here to reset