Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models

12/03/2022
by   Reem Gody, et al.
0

Self-supervised learning (SSL) has been able to leverage unlabeled data to boost the performance of automatic speech recognition (ASR) models when we have access to only a small amount of transcribed speech data. However, this raises the question of which subset of the available unlabeled data should be selected for transcription. Our work investigates different unsupervised data selection techniques for fine-tuning the HuBERT model under a limited transcription budget. We investigate the impact of speaker diversity, gender bias, and topic diversity on the downstream ASR performance. We also devise two novel techniques for unsupervised data selection: pre-training loss based data selection and the perplexity of byte pair encoded clustered units (PBPE) and we show how these techniques compare to pure random data selection. Finally, we analyze the correlations between the inherent characteristics of the selected fine-tuning subsets as well as how these characteristics correlate with the resultant word error rate. We demonstrate the importance of token diversity, speaker diversity, and topic diversity in achieving the best performance in terms of WER.

READ FULL TEXT
research
03/18/2022

Towards Representative Subset Selection for Self-Supervised Speech Recognition

Self-supervised speech recognition models require considerable labeled t...
research
07/29/2022

Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and Self-training of Neural Transducer

This paper proposes a new approach to perform unsupervised fine-tuning a...
research
03/12/2023

Fine-tuning Strategies for Faster Inference using Speech Self-Supervised Models: A Comparative Study

Self-supervised learning (SSL) has allowed substantial progress in Autom...
research
08/28/2023

Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition

In recent years, speech-based self-supervised learning (SSL) has made si...
research
02/26/2023

Speech Corpora Divergence Based Unsupervised Data Selection for ASR

Selecting application scenarios matching data is important for the autom...
research
10/10/2021

Personalizing ASR with limited data using targeted subset selection

We study the task of personalizing ASR models to a target non-native spe...
research
04/04/2022

A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

Self-supervised models for speech processing emerged recently as popular...

Please sign up or login with your details

Forgot password? Click here to reset