Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems A case study for Modern Greek

12/31/2022
by   Georgios Paraskevopoulos, et al.
0

Modern speech recognition systems exhibits rapid performance degradation under domain shift. This issue is especially prevalent in data-scarce settings, such as low-resource languages, where diversity of training data is limited. In this work we propose M2DS2, a simple and sample-efficient finetuning strategy for large pretrained speech models, based on mixed source and target domain self-supervision. We find that including source domain self-supervision stabilizes training and avoids mode collapse of the latent representations. For evaluation, we collect HParl, a 120 hour speech corpus for Greek, consisting of plenary sessions in the Greek Parliament. We merge HParl with two popular Greek corpora to create GREC-MD, a test-bed for multi-domain evaluation of Greek ASR systems. In our experiments we find that, while other Unsupervised Domain Adaptation baselines fail in this resource-constrained environment, M2DS2 yields significant improvements for cross-domain adaptation, even when a only a few hours of in-domain audio are available. When we relax the problem in a weakly supervised setting, we find that independent adaptation for audio using M2DS2 and language using simple LM augmentation techniques is particularly effective, yielding word error rates comparable to the fully supervised baselines.

READ FULL TEXT
research
09/12/2021

Unsupervised Domain Adaptation Schemes for Building ASR in Low-resource Languages

Building an automatic speech recognition (ASR) system from scratch requi...
research
07/19/2017

Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation

Domain mismatch between training and testing can lead to significant deg...
research
05/30/2023

Weakly-supervised forced alignment of disfluent speech using phoneme-level modeling

The study of speech disorders can benefit greatly from time-aligned data...
research
11/26/2020

Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training

The performance of automatic speech recognition (ASR) systems typically ...
research
08/02/2020

Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages

State-of-the-art spoken language identification (LID) systems, which are...
research
01/24/2020

Data Techniques For Online End-to-end Speech Recognition

Practitioners often need to build ASR systems for new use cases in a sho...
research
12/17/2019

Libri-Light: A Benchmark for ASR with Limited or No Supervision

We introduce a new collection of spoken English audio suitable for train...

Please sign up or login with your details

Forgot password? Click here to reset