Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation

05/14/2022
by   Gerard Sant, et al.
1

Transformer-based models have been achieving state-of-the-art results in several fields of Natural Language Processing. However, its direct application to speech tasks is not trivial. The nature of this sequences carries problems such as long sequence lengths and redundancy between adjacent tokens. Therefore, we believe that regular self-attention mechanism might not be well suited for it. Different approaches have been proposed to overcome these problems, such as the use of efficient attention mechanisms. However, the use of these methods usually comes with a cost, which is a performance reduction caused by information loss. In this study, we present the Multiformer, a Transformer-based model which allows the use of different attention mechanisms on each head. By doing this, the model is able to bias the self-attention towards the extraction of more diverse token interactions, and the information loss is reduced. Finally, we perform an analysis of the head contributions, and we observe that those architectures where all heads relevance is uniformly distributed obtain better results. Our results show that mixing attention patterns along the different heads and layers outperforms our baseline by up to 0.7 BLEU.

READ FULL TEXT
research
04/19/2022

On the Locality of Attention in Direct Speech Translation

Transformers have achieved state-of-the-art results across multiple NLP ...
research
10/06/2021

PoNet: Pooling Network for Efficient Token Mixing in Long Sequences

Transformer-based models have achieved great success in various NLP, vis...
research
12/03/2020

Do We Really Need That Many Parameters In Transformer For Extractive Summarization? Discourse Can Help !

The multi-head self-attention of popular transformer models is widely us...
research
07/22/2021

FNetAR: Mixing Tokens with Autoregressive Fourier Transforms

In this note we examine the autoregressive generalization of the FNet al...
research
04/10/2020

Telling BERT's full story: from Local Attention to Global Aggregation

We take a deep look into the behavior of self-attention heads in the tra...
research
04/04/2019

Visualizing Attention in Transformer-Based Language Representation Models

We present an open-source tool for visualizing multi-head self-attention...
research
06/17/2022

Local Slot Attention for Vision-and-Language Navigation

Vision-and-language navigation (VLN), a frontier study aiming to pave th...

Please sign up or login with your details

Forgot password? Click here to reset