Transformer-based End-to-End Speech Recognition with Local Dense Synthesizer Attention

10/23/2020
by   Menglong Xu, et al.
0

Recently, several studies reported that dot-product selfattention (SA) may not be indispensable to the state-of-theart Transformer models. Motivated by the fact that dense synthesizer attention (DSA), which dispenses with dot products and pairwise interactions, achieved competitive results in many language processing tasks, in this paper, we first propose a DSA-based speech recognition, as an alternative to SA. To reduce the computational complexity and improve the performance, we further propose local DSA (LDSA) to restrict the attention scope of DSA to a local range around the current central frame for speech recognition. Finally, we combine LDSA with SA to extract the local and global information simultaneously. Experimental results on the Ai-shell1 Mandarine speech recognition corpus show that the proposed LDSA-Transformer achieves a character error rate (CER) of 6.49 that of the SA-Transformer. Meanwhile, the LDSA-Transformer requires less computation than the SATransformer. The proposed combination method not only achieves a CER of 6.18 but also has roughly the same number of parameters and computational complexity as the latter. The implementation of the multi-head LDSA is available at https://github.com/mlxu995/multihead-LDSA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2023

HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition

State-of-the-art ASR systems have achieved promising results by modeling...
research
10/28/2019

Transformer-Transducer: End-to-End Speech Recognition with Self-Attention

We explore options to use Transformer networks in neural transducer for ...
research
04/11/2023

Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition

In recent years, a great deal of attention has been paid to the Transfor...
research
11/09/2020

Efficient End-to-End Speech Recognition Using Performers in Conformers

On-device end-to-end speech recognition poses a high requirement on mode...
research
05/21/2023

Multi-Head State Space Model for Speech Recognition

State space models (SSMs) have recently shown promising results on small...
research
08/22/2023

An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification

Wav2vec2 has achieved success in applying Transformer architecture and s...
research
07/06/2022

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

Conformer has proven to be effective in many speech processing tasks. It...

Please sign up or login with your details

Forgot password? Click here to reset