SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition

10/11/2021
by   Jing Pan, et al.
0

The Transformer architecture has been well adopted as a dominant architecture in most sequence transduction tasks including automatic speech recognition (ASR), since its attention mechanism excels in capturing long-range dependencies. While models built solely upon attention can be better parallelized than regular RNN, a novel network architecture, SRU++, was recently proposed. By combining the fast recurrence and attention mechanism, SRU++ exhibits strong capability in sequence modeling and achieves near-state-of-the-art results in various language modeling and machine translation tasks with improved compute efficiency. In this work, we present the advantages of applying SRU++ in ASR tasks by comparing with Conformer across multiple ASR benchmarks and study how the benefits can be generalized to long-form speech inputs. On the popular LibriSpeech benchmark, our SRU++ model achieves 2.0 performances compared with the state-of-the-art Conformer encoder under the same set-up. Specifically, SRU++ can surpass Conformer on long-form speech input with a large margin, based on our analysis.

READ FULL TEXT
research
08/31/2021

Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition

The recently proposed Conformer architecture has shown state-of-the-art ...
research
06/24/2015

Attention-Based Models for Speech Recognition

Recurrent sequence generators conditioned on input data through an atten...
research
09/14/2023

Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks

The Transformer architecture has proven to be highly effective for Autom...
research
05/08/2023

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

Conformer-based models have become the most dominant end-to-end architec...
research
05/23/2017

Local Monotonic Attention Mechanism for End-to-End Speech and Language Processing

Recently, encoder-decoder neural networks have shown impressive performa...
research
04/25/2023

State Spaces Aren't Enough: Machine Translation Needs Attention

Structured State Spaces for Sequences (S4) is a recently proposed sequen...
research
06/13/2019

Lattice Transformer for Speech Translation

Recent advances in sequence modeling have highlighted the strengths of t...

Please sign up or login with your details

Forgot password? Click here to reset