Attention-Based End-to-End Speech Recognition on Voice Search

07/22/2017
by   Changhao Shan, et al.
0

Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments. In this paper, we explore the use of attention-based encoder-decoder model for Mandarin speech recognition on voice search. We propose a smoothing method for attention mechanism and compare with content attention and convolutional attention. Moreover, frame skipping is employed for fast training and convergence. On the XiaoMi TV voice search dataset, we achieve a character error rate (CER) of 3.58 of 7.43 trigram language model, we reach 2.81

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset