Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition

10/01/2022
by   Jash Rathod, et al.
5

The smaller memory bandwidth in smart devices prompts development of smaller Automatic Speech Recognition (ASR) models. To obtain a smaller model, one can employ the model compression techniques. Knowledge distillation (KD) is a popular model compression approach that has shown to achieve smaller model size with relatively lesser degradation in the model performance. In this approach, knowledge is distilled from a trained large size teacher model to a smaller size student model. Also, the transducer based models have recently shown to perform well for on-device streaming ASR task, while the conformer models are efficient in handling long term dependencies. Hence in this work we employ a streaming transducer architecture with conformer as the encoder. We propose a multi-stage progressive approach to compress the conformer transducer model using KD. We progressively update our teacher model with the distilled student model in a multi-stage setup. On standard LibriSpeech dataset, our experimental results have successfully achieved compression rates greater than 60 significant degradation in the performance compared to the larger teacher model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/08/2022

Two-Pass End-to-End ASR Model Compression

Speech recognition on smart devices is challenging owing to the small me...
research
03/16/2023

DistillW2V2: A Small and Streaming Wav2vec 2.0 Based ASR Model

Wav2vec 2.0 (W2V2) has shown impressive performance in automatic speech ...
research
03/29/2021

Shrinking Bigfoot: Reducing wav2vec 2.0 footprint

Wav2vec 2.0 is a state-of-the-art speech recognition model which maps sp...
research
08/31/2023

Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer

Streaming automatic speech recognition (ASR) models are restricted from ...
research
09/30/2020

Pea-KD: Parameter-efficient and Accurate Knowledge Distillation

How can we efficiently compress a model while maintaining its performanc...
research
02/20/2023

Progressive Knowledge Distillation: Building Ensembles for Efficient Inference

We study the problem of progressive distillation: Given a large, pre-tra...
research
11/07/2022

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Automatic Speech Recognition (ASR) systems typically yield output in lex...

Please sign up or login with your details

Forgot password? Click here to reset