Pushing the Limits of Non-Autoregressive Speech Recognition

04/07/2021
by   Edwin G. Ng, et al.
0

We combine recent advancements in end-to-end speech recognition to non-autoregressive automatic speech recognition. We push the limits of non-autoregressive state-of-the-art results for multiple datasets: LibriSpeech, Fisher+Switchboard and Wall Street Journal. Key to our recipe, we leverage CTC on giant Conformer neural network architectures with SpecAugment and wav2vec2 pre-training. We achieve 1.8 5.1 a language model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2021

SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network

We present SpeechStew, a speech recognition model that is trained on a c...
research
03/12/2020

Hybrid Autoregressive Transducer (hat)

This paper proposes and evaluates the hybrid autoregressive transducer (...
research
04/05/2019

Jasper: An End-to-End Convolutional Neural Acoustic Model

In this paper, we report state-of-the-art results on LibriSpeech among e...
research
05/11/2021

Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation

We study the possibilities of building a non-autoregressive speech-to-te...
research
03/01/2019

KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos

In this paper, we describe KT-Speech-Crawler: an approach for automatic ...
research
04/06/2021

LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring

Neural network-based language models are commonly used in rescoring appr...
research
07/22/2021

CarneliNet: Neural Mixture Model for Automatic Speech Recognition

End-to-end automatic speech recognition systems have achieved great accu...

Please sign up or login with your details

Forgot password? Click here to reset