Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition

07/02/2021
by   Niko Moritz, et al.
0

Attention-based end-to-end automatic speech recognition (ASR) systems have recently demonstrated state-of-the-art results for numerous tasks. However, the application of self-attention and attention-based encoder-decoder models remains challenging for streaming ASR, where each word must be recognized shortly after it was spoken. In this work, we present the dual causal/non-causal self-attention (DCN) architecture, which in contrast to restricted self-attention prevents the overall context to grow beyond the look-ahead of a single layer when used in a deep architecture. DCN is compared to chunk-based and restricted self-attention using streaming transformer and conformer architectures, showing improved ASR performance over restricted self-attention and competitive ASR results compared to chunk-based self-attention, while providing the advantage of frame-synchronous processing. Combined with triggered attention, the proposed streaming end-to-end ASR systems obtained state-of-the-art results on the LibriSpeech, HKUST, and Switchboard ASR tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/08/2020

Streaming automatic speech recognition with the transformer model

Encoder-decoder based sequence-to-sequence models have demonstrated stat...
research
04/07/2021

Capturing Multi-Resolution Context by Dilated Self-Attention

Self-attention has become an important and widely used neural network co...
research
03/25/2022

Spatial Processing Front-End For Distant ASR Exploiting Self-Attention Channel Combinator

We present a novel multi-channel front-end based on channel shortening w...
research
02/18/2021

Gaussian Kernelized Self-Attention for Long Sequence Data and Its Application to CTC-based Speech Recognition

Self-attention (SA) based models have recently achieved significant perf...
research
04/18/2023

Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR

Recently, there has been an increasing interest in unifying streaming an...
research
11/21/2022

Sequentially Sampled Chunk Conformer for Streaming End-to-End ASR

This paper presents an in-depth study on a Sequentially Sampled Chunk Co...
research
09/18/2023

Investigating End-to-End ASR Architectures for Long Form Audio Transcription

This paper presents an overview and evaluation of some of the end-to-end...

Please sign up or login with your details

Forgot password? Click here to reset