On the Usefulness of Self-Attention for Automatic Speech Recognition with Transformers

11/08/2020
by   Shucong Zhang, et al.
0

Self-attention models such as Transformers, which can capture temporal relationships without being limited by the distance between events, have given competitive speech recognition results. However, we note the range of the learned context increases from the lower to upper self-attention layers, whilst acoustic events often happen within short time spans in a left-to-right order. This leads to a question: for speech recognition, is a global view of the entire sequence useful for the upper self-attention encoder layers in Transformers? To investigate this, we train models with lower self-attention/upper feed-forward layers encoders on Wall Street Journal and Switchboard. Compared to baseline Transformers, no performance drop but minor gains are observed. We further developed a novel metric of the diagonality of attention matrices and found the learned diagonality indeed increases from the lower to upper encoder self-attention layers. We conclude the global view is unnecessary in training upper encoder layers.

READ FULL TEXT
research
05/28/2020

When Can Self-Attention Be Replaced by Feed Forward Layers?

Recently, self-attention models such as Transformers have given competit...
research
02/09/2021

Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers

Although the lower layers of a deep neural network learn features which ...
research
01/22/2019

Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition

Self-attention has demonstrated great success in sequence-to-sequence ta...
research
04/14/2021

Pose Recognition with Cascade Transformers

In this paper, we present a regression-based pose recognition method usi...
research
08/09/2021

TransForensics: Image Forgery Localization with Dense Self-Attention

Nowadays advanced image editing tools and technical skills produce tampe...
research
07/19/2022

Relational Future Captioning Model for Explaining Likely Collisions in Daily Tasks

Domestic service robots that support daily tasks are a promising solutio...
research
09/01/2022

Deep Sparse Conformer for Speech Recognition

Conformer has achieved impressive results in Automatic Speech Recognitio...

Please sign up or login with your details

Forgot password? Click here to reset