Cascaded CNN-resBiLSTM-CTC: An End-to-End Acoustic Model For Speech Recognition

10/29/2018
by   Xinpei Zhou, et al.
0

Automatic speech recognition (ASR) tasks are resolved by end-to-end deep learning models, which benefits us by less preparation of raw data, and easier transformation between languages. We propose a novel end-to-end deep learning model architecture namely cascaded CNN-resBiLSTM-CTC. In the proposed model, we add residual blocks in BiLSTM layers to extract sophisticated phoneme and semantic information together, and apply cascaded structure to pay more attention mining information of hard negative samples. By applying both simple Fast Fourier Transform (FFT) technique and n-gram language model (LM) rescoring method, we manage to achieve word error rate (WER) of 3.41 clean corpora. Furthermore, we propose a new batch-varied method to speed up the training process in length-varied tasks, which result in 25 time.

READ FULL TEXT
research
07/07/2022

End-to-end Speech-to-Punctuated-Text Recognition

Conventional automatic speech recognition systems do not produce punctua...
research
05/25/2020

Adapting End-to-End Speech Recognition for Readable Subtitles

Automatic speech recognition (ASR) systems are primarily evaluated on tr...
research
10/10/2016

Very Deep Convolutional Networks for End-to-End Speech Recognition

Sequence-to-sequence models have shown success in end-to-end speech reco...
research
02/11/2021

DEEPF0: End-To-End Fundamental Frequency Estimation for Music and Speech Signals

We propose a novel pitch estimation technique called DeepF0, which lever...
research
05/19/2020

Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition

Knowledge distillation has been widely used to compress existing deep le...
research
05/07/2020

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

Convolutional neural networks (CNN) have shown promising results for end...
research
04/08/2019

Exploring Methods for the Automatic Detection of Errors in Manual Transcription

Quality of data plays an important role in most deep learning tasks. In ...

Please sign up or login with your details

Forgot password? Click here to reset