FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization

by   Zhengkun Tian, et al.

Transducer-based models, such as RNN-Transducer and transformer-transducer, have achieved great success in speech recognition. A typical transducer model decodes the output sequence conditioned on the current acoustic state and previously predicted tokens step by step. Statistically, The number of blank tokens in the prediction results accounts for nearly 90% of all tokens. It takes a lot of computation and time to predict the blank tokens, but only the non-blank tokens will appear in the final output sequence. Therefore, we propose a method named fast-skip regularization, which tries to align the blank position predicted by a transducer with that predicted by a CTC model. During the inference, the transducer model can predict the blank tokens in advance by a simple CTC project layer without many complicated forward calculations of the transducer decoder and then skip them, which will reduce the computation and improve the inference speed greatly. All experiments are conducted on a public Chinese mandarin dataset AISHELL-1. The results show that the fast-skip regularization can indeed help the transducer model learn the blank position alignments. Besides, the inference with fast-skip can be speeded up nearly 4 times with only a little performance degradation.


page 1

page 2

page 3

page 4


TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition

The autoregressive (AR) models, such as attention-based encoder-decoder ...

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

Transformers have recently dominated the ASR field. Although able to yie...

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

This paper introduces a novel Token-and-Duration Transducer (TDT) archit...

Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization

The CTC model has been widely applied to many application scenarios beca...

Optimized Table Tokenization for Table Structure Recognition

Extracting tables from documents is a crucial task in any document conve...

Transkimmer: Transformer Learns to Layer-wise Skim

Transformer architecture has become the de-facto model for many machine ...

Confidence-Aware Scheduled Sampling for Neural Machine Translation

Scheduled sampling is an effective method to alleviate the exposure bias...

Please sign up or login with your details

Forgot password? Click here to reset