Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit

12/07/2022
by   Pengcheng Li, et al.
0

Speech pre-training has shown great success in learning useful and general latent representations from large-scale unlabeled data. Based on a well-designed self-supervised learning pattern, pre-trained models can be used to serve lots of downstream speech tasks such as automatic speech recognition. In order to take full advantage of the labed data in low resource task, we present an improved pre-training method by introducing a supervision-enhanced acoustic unit (SEAU) pattern to intensify the expression of comtext information and ruduce the training cost. Encoder representations extracted from the SEAU pattern are used to generate more representative target units for HuBERT pre-training process. The proposed method, named SeHuBERT, achieves a relative word error rate reductions of 10.5 on Turkmen speech recognition task with 500 hours and 100 hours fine-tuning data respectively. Extended to more languages and more data, SeHuBERT can aslo achieve a relative word error rate reductions of approximately 10 the training cost compared with HuBERT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2022

HuBERT-TR: Reviving Turkish Automatic Speech Recognition with Self-supervised Speech Representation Learning

While the Turkish language is listed among low-resource languages, liter...
research
10/28/2019

Unsupervised pre-training for sequence to sequence speech recognition

This paper proposes a novel approach to pre-train encoder-decoder sequen...
research
11/04/2022

Biased Self-supervised learning for ASR

Self-supervised learning via masked prediction pre-training (MPPT) has s...
research
10/28/2019

Unsupervised pre-traing for sequence to sequence speech recognition

This paper proposes a novel approach to pre-train encoder-decoder sequen...
research
09/14/2021

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

This paper is a study of performance-efficiency trade-offs in pre-traine...
research
10/26/2022

UFO2: A unified pre-training framework for online and offline speech recognition

In this paper, we propose a Unified pre-training Framework for Online an...
research
07/08/2022

Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription

Self-supervised-learning-based pre-trained models for speech data, such ...

Please sign up or login with your details

Forgot password? Click here to reset