Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition

03/31/2022
by   Anirudh Gupta, et al.
0

In the recent years end to end (E2E) automatic speech recognition (ASR) systems have achieved promising results given sufficient resources. Even for languages where not a lot of labelled data is available, state of the art E2E ASR systems can be developed by pretraining on huge amounts of high resource languages and finetune on low resource languages. For a lot of low resource languages the current approaches are still challenging, since in many cases labelled data is not available in open domain. In this paper we present an approach to create labelled data for Maithili, Bhojpuri and Dogri by utilising pseudo labels from text to speech for forced alignment. The created data was inspected for quality and then further used to train a transformer based wav2vec 2.0 ASR model. All data and models are available in open domain.

READ FULL TEXT

page 1

page 2

page 3

research
03/30/2022

Vakyansh: ASR Toolkit for Low Resource Indic languages

We present Vakyansh, an end to end toolkit for Speech Recognition in Ind...
research
08/26/2022

Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages

End-to-end (E2E) models have become the default choice for state-of-the-...
research
10/06/2021

Integrating Categorical Features in End-to-End ASR

All-neural, end-to-end ASR systems gained rapid interest from the speech...
research
09/16/2022

An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning

An independent, automated method of decoding and transcribing oral speec...
research
11/07/2020

Acoustics Based Intent Recognition Using Discovered Phonetic Units for Low Resource Languages

With recent advancements in language technologies, humansare now interac...
research
07/24/2023

Code-Switched Urdu ASR for Noisy Telephonic Environment using Data Centric Approach with Hybrid HMM and CNN-TDNN

Call Centers have huge amount of audio data which can be used for achiev...
research
05/19/2023

Unsupervised ASR via Cross-Lingual Pseudo-Labeling

Recent work has shown that it is possible to train an unsupervised autom...

Please sign up or login with your details

Forgot password? Click here to reset