Incremental Layer-wise Self-Supervised Learning for Efficient Speech Domain Adaptation On Device

10/01/2021
by   Zhouyuan Huo, et al.
0

Streaming end-to-end speech recognition models have been widely applied to mobile devices and show significant improvement in efficiency. These models are typically trained on the server using transcribed speech data. However, the server data distribution can be very different from the data distribution on user devices, which could affect the model performance. There are two main challenges for on device training, limited reliable labels and limited training memory. While self-supervised learning algorithms can mitigate the mismatch between domains using unlabeled data, they are not applicable on mobile devices directly because of the memory constraint. In this paper, we propose an incremental layer-wise self-supervised learning algorithm for efficient speech domain adaptation on mobile devices, in which only one layer is updated at a time. Extensive experimental results demonstrate that the proposed algorithm obtains a Word Error Rate (WER) on the target domain 24.2% better than supervised baseline and costs 89.7% less training memory than the end-to-end self-supervised learning algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2021

Self-Supervised Learning Based Domain Adaptation for Robust Speaker Verification

Large performance degradation is often observed for speaker ver-ificatio...
research
02/03/2022

Self-supervised Learning with Random-projection Quantizer for Speech Recognition

We present a simple and effective self-supervised learning approach for ...
research
06/27/2022

Wav2Vec-Aug: Improved self-supervised training with limited data

Self-supervised learning (SSL) of speech representations has received mu...
research
10/27/2022

Training Autoregressive Speech Recognition Models with Limited in-domain Supervision

Advances in self-supervised learning have significantly reduced the amou...
research
03/29/2020

Self-Supervised Learning for Domain Adaptation on Point-Clouds

Self-supervised learning (SSL) allows to learn useful representations fr...
research
12/07/2021

Training end-to-end speech-to-text models on mobile phones

Training the state-of-the-art speech-to-text (STT) models in mobile devi...
research
03/01/2022

Low-Cost On-device Partial Domain Adaptation (LoCO-PDA): Enabling efficient CNN retraining on edge devices

With the increased deployment of Convolutional Neural Networks (CNNs) on...

Please sign up or login with your details

Forgot password? Click here to reset