Hard-Coded Gaussian Attention for Neural Machine Translation

05/02/2020
by   Weiqiu You, et al.
0

Recent work has questioned the importance of the Transformer's multi-headed attention for achieving high translation quality. We push further in this direction by developing a "hard-coded" attention variant without any learned parameters. Surprisingly, replacing all learned self-attention heads in the encoder and decoder with fixed, input-agnostic Gaussian distributions minimally impacts BLEU scores across four different language pairs. However, additionally hard-coding cross attention (which connects the decoder to the encoder) significantly lowers BLEU, suggesting that it is more important than self-attention. Much of this BLEU drop can be recovered by adding just a single learned cross attention head to an otherwise hard-coded Transformer. Taken as a whole, our results offer insight into which components of the Transformer are actually important, which we hope will guide future work into the development of simpler and more efficient attention-based models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2020

Efficient Inference For Neural Machine Translation

Large Transformer models have achieved state-of-the-art results in neura...
research
02/24/2020

Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation

Transformer-based models have brought a radical change to neural machine...
research
09/21/2019

Self-attention based end-to-end Hindi-English Neural Machine Translation

Machine Translation (MT) is a zone of concentrate in Natural Language pr...
research
10/13/2021

Semantics-aware Attention Improves Neural Machine Translation

The integration of syntactic structures into Transformer machine transla...
research
05/23/2019

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned

Multi-head self-attention is a key component of the Transformer, a state...
research
09/20/2022

Relaxed Attention for Transformer Models

The powerful modeling capabilities of all-attention-based transformer ar...
research
11/10/2019

Two-Headed Monster And Crossed Co-Attention Networks

This paper presents some preliminary investigations of a new co-attentio...

Please sign up or login with your details

Forgot password? Click here to reset