State Spaces Aren't Enough: Machine Translation Needs Attention

04/25/2023
by   Ali Vardasbi, et al.
0

Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g. vision, language modeling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state, and is able to capture long range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Translation (MT), and evaluate several encoder-decoder variants on WMT'14 and WMT'16. In contrast with the success in language modeling, we find that S4 lags behind the Transformer by approximately 4 BLEU points, and that it counter-intuitively struggles with long sentences. Finally, we show that this gap is caused by S4's inability to summarize the full source sentence in a single hidden state, and show that we can close the gap by introducing an attention mechanism.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

Is Encoder-Decoder Redundant for Neural Machine Translation?

Encoder-decoder architecture is widely adopted for sequence-to-sequence ...
research
10/11/2021

SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition

The Transformer architecture has been well adopted as a dominant archite...
research
12/15/2022

Attention as a guide for Simultaneous Speech Translation

The study of the attention mechanism has sparked interest in many fields...
research
12/15/2022

Efficient Long Sequence Modeling via State Space Augmented Transformer

Transformer models have achieved superior performance in various natural...
research
09/21/2022

Mega: Moving Average Equipped Gated Attention

The design choices in the Transformer attention mechanism, including wea...
research
09/14/2017

Global-Context Neural Machine Translation through Target-Side Attentive Residual Connections

Neural sequence-to-sequence models achieve remarkable performance not on...
research
07/18/2019

Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) Time

A key requirement in sequence to sequence processing is the modeling of ...

Please sign up or login with your details

Forgot password? Click here to reset