Efficient Knowledge Distillation for RNN-Transducer Models

11/11/2020
by   Sankaran Panchapagesan, et al.
0

Knowledge Distillation is an effective method of transferring knowledge from a large model to a smaller model. Distillation can be viewed as a type of model compression, and has played an important role for on-device ASR applications. In this paper, we develop a distillation method for RNN-Transducer (RNN-T) models, a popular end-to-end neural network architecture for streaming speech recognition. Our proposed distillation loss is simple and efficient, and uses only the "y" and "blank" posterior probabilities from the RNN-T output probability lattice. We study the effectiveness of the proposed approach in improving the accuracy of sparse RNN-T models obtained by gradually pruning a larger uncompressed model, which also serves as the teacher during distillation. With distillation of 60 models, we obtain WER reductions of 4.3 FarField eval set. We also present results of experiments on LibriSpeech, where the introduction of the distillation loss yields a 4.8 on the test-other dataset for a small Conformer model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/08/2022

Two-Pass End-to-End ASR Model Compression

Speech recognition on smart devices is challenging owing to the small me...
research
03/10/2023

Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss

This work studies knowledge distillation (KD) and addresses its constrai...
research
03/16/2023

DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model

Wav2vec 2.0 (W2V2) has shown impressive performance in automatic speech ...
research
04/06/2019

Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion

Grapheme-to-phoneme (G2P) conversion is an important task in automatic s...
research
09/19/2016

A scalable convolutional neural network for task-specified scenarios via knowledge distillation

In this paper, we explore the redundancy in convolutional neural network...
research
06/27/2023

Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation

Transducer is one of the mainstream frameworks for streaming speech reco...
research
10/27/2021

Temporal Knowledge Distillation for On-device Audio Classification

Improving the performance of on-device audio classification models remai...

Please sign up or login with your details

Forgot password? Click here to reset