Gated Recurrent Context: Softmax-free Attention for Online Encoder-Decoder Speech Recognition

07/10/2020
by   Hyeonseung Lee, et al.
0

Recently, attention-based encoder-decoder (AED) models have shown state-of-the-art performance in automatic speech recognition (ASR). As the original AED models with global attentions are not capable of online inference, various online attention schemes have been developed to reduce ASR latency for better user experience. However, a common limitation of the conventional softmax-based online attention approaches is that they introduce an additional hyperparameter related to the length of the attention window, requiring multiple trials of model training for tuning the hyperparameter. In order to deal with this problem, we propose a novel softmax-free attention method and its modified formulation for online attention, which does not need any additional hyperparameter at the training phase. Through a number of ASR experiments, we demonstrate the tradeoff between the latency and performance of the proposed online attention technique can be controlled by merely adjusting a threshold at the test phase. Furthermore, the proposed methods showed competitive performance to the conventional global and online attentions in terms of word-error-rates (WERs).

READ FULL TEXT

page 1

page 7

research
07/02/2021

Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition

Recently, attention-based encoder-decoder (AED) models have shown high p...
research
02/28/2021

Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition

This article describes an efficient training method for online streaming...
research
08/12/2020

Online Automatic Speech Recognition with Listen, Attend and Spell Model

The Listen, Attend and Spell (LAS) model and other attention-based autom...
research
01/15/2020

Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture

Recently, Transformer has gained success in automatic speech recognition...
research
04/24/2023

Self-regularised Minimum Latency Training for Streaming Transformer-based Speech Recognition

This paper proposes a self-regularised minimum latency training (SR-MLT)...
research
09/15/2023

Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

We study a streamable attention-based encoder-decoder model in which eit...

Please sign up or login with your details

Forgot password? Click here to reset