Gumbel-Attention for Multi-modal Machine Translation

03/16/2021
by   Pengbo Liu, et al.
0

Multi-modal machine translation (MMT) improves translation quality by introducing visual information. However, the existing MMT model ignores the problem that the image will bring information irrelevant to the text, causing much noise to the model and affecting the translation quality. In this paper, we propose a novel Gumbel-Attention for multi-modal machine translation, which selects the text-related parts of the image features. Specifically, different from the previous attention-based method, we first use a differentiable method to select the image information and automatically remove the useless parts of the image features. Through the score matrix of Gumbel-Attention and image features, the image-aware text representation is generated. And then, we independently encode the text representation and the image-aware text representation with the multi-modal encoder. Finally, the final output of the encoder is obtained through multi-modal gated fusion. Experiments and case analysis proves that our method retains the image features related to the text, and the remaining parts help the MMT model generates better translations.

READ FULL TEXT

page 3

page 6

research
02/04/2017

Doubly-Attentive Decoder for Multi-modal Neural Machine Translation

We introduce a Multi-modal Neural Machine Translation model in which a d...
research
07/17/2020

A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation

Multi-modal neural machine translation (NMT) aims to translate source se...
research
12/27/2019

Visual Agreement Regularized Training for Multi-Modal Machine Translation

Multi-modal machine translation aims at translating the source sentence ...
research
03/11/2023

Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation

Referring image segmentation segments an image from a language expressio...
research
01/23/2017

Incorporating Global Visual Features into Attention-Based Neural Machine Translation

We introduce multi-modal, attention-based neural machine translation (NM...
research
11/21/2019

Generating Diverse Translation by Manipulating Multi-Head Attention

Transformer model has been widely used on machine translation tasks and ...
research
05/03/2021

Multi-modal Bifurcated Network for Depth Guided Image Relighting

Image relighting aims to recalibrate the illumination setting in an imag...

Please sign up or login with your details

Forgot password? Click here to reset