Self-Training for End-to-End Speech Recognition

09/19/2019
by   Jacob Kahn, et al.
0

We revisit self-training in the context of end-to-end speech recognition. We demonstrate that training with pseudo-labels can substantially improve the accuracy of a baseline model by leveraging unlabelled data. Key to our approach are a strong baseline acoustic and language model used to generate the pseudo-labels, a robust and stable beam-search decoder, and a novel ensemble approach used to increase pseudo-label diversity. Experiments on the LibriSpeech corpus show that self-training with a single model can yield a 21 relative WER improvement on clean data over a baseline trained on 100 hours of labelled data. We also evaluate label filtering approaches to increase pseudo-label quality. With an ensemble of six models in conjunction with label filtering, self-training yields a 26 the gap between the baseline and an oracle model trained with all of the labels.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset