4-bit Conformer with Native Quantization Aware Training for Speech Recognition

03/29/2022
by   Shaojin Ding, et al.
0

Reducing the latency and model size has always been a significant research problem for live Automatic Speech Recognition (ASR) application scenarios. Along this direction, model quantization has become an increasingly popular approach to compress neural networks and reduce computation cost. Most of the existing practical ASR systems apply post-training 8-bit quantization. To achieve a higher compression rate without introducing additional performance regression, in this study, we propose to develop 4-bit ASR models with native quantization aware training, which leverages native integer operations to effectively optimize both training and inference. We conducted two experiments on state-of-the-art Conformer-based ASR models to evaluate our proposed quantization technique. First, we explored the impact of different precisions for both weight and activation quantization on the LibriSpeech dataset, and obtained a lossless 4-bit Conformer model with 7.7x size reduction compared to the float32 model. Following this, we for the first time investigated and revealed the viability of 4-bit quantization on a practical ASR system that is trained with large-scale datasets, and produced a lossless Conformer ASR model with mixed 4-bit and 8-bit weights that has 5x size reduction compared to the float32 model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2022

Sub-8-bit quantization for on-device speech recognition: a regularization-free approach

For on-device automatic speech recognition (ASR), quantization aware tra...
research
06/16/2020

Quantization of Acoustic Model Parameters in Automatic Speech Recognition Framework

Robust automatic speech recognition (ASR) system exploits state-of-the-a...
research
05/24/2023

RAND: Robustness Aware Norm Decay For Quantized Seq2seq Models

With the rapid increase in the size of neural networks, model compressio...
research
02/09/2021

Sparsification via Compressed Sensing for Automatic Speech Recognition

In order to achieve high accuracy for machine learning (ML) applications...
research
08/27/2021

4-bit Quantization of LSTM-based Speech Recognition Models

We investigate the impact of aggressive low-precision representations of...
research
03/31/2021

Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition

End-to-end neural network models achieve improved performance on various...
research
07/24/2023

A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization

Recent advancement in Automatic Speech Recognition (ASR) has produced la...

Please sign up or login with your details

Forgot password? Click here to reset