Improving Transformer-based Speech Recognition Using Unsupervised Pre-training

10/22/2019
by   Dongwei Jiang, et al.
0

Speech recognition technologies are gaining enormous popularity in various industrial applications. However, building a good speech recognition system usually requires significant amounts of transcribed data which is expensive to collect. To tackle this problem, we propose a novel unsupervised pre-training method called masked predictive coding, which can be applied for unsupervised pre-training with Transformer based model. Experiments on HKUST show that using the same training data and other open source Mandarin data, we can reduce CER of a strong Transformer based baseline by 3.7 reduce CER of AISHELL-1 by 12.9

READ FULL TEXT
research
05/20/2020

A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition

Building a good speech recognition system usually requires large amounts...
research
04/11/2019

wav2vec: Unsupervised Pre-training for Speech Recognition

We explore unsupervised pre-training for speech recognition by learning ...
research
02/12/2021

Bi-APC: Bidirectional Autoregressive Predictive Coding for Unsupervised Pre-training and Its Application to Children's ASR

We present a bidirectional unsupervised model pre-training (UPT) method ...
research
07/29/2020

Transformer based unsupervised pre-training for acoustic representation learning

Computational audio analysis has become a central issue in associated ar...
research
01/28/2020

Unsupervised Pre-training of Bidirectional Speech Encoders via Masked Reconstruction

We propose an approach for pre-training speech representations via a mas...
research
11/27/2019

AIPNet: Generative Adversarial Pre-training of Accent-invariant Networks for End-to-end Speech Recognition

As one of the major sources in speech variability, accents have posed a ...

Please sign up or login with your details

Forgot password? Click here to reset