VALHALLA: Visual Hallucination for Machine Translation

05/31/2022
by   Yi Li, et al.
0

Designing better machine translation systems by considering auxiliary inputs such as images has attracted much attention in recent years. While existing methods show promising performance over the conventional text-only translation systems, they typically require paired text and image as input during inference, which limits their applicability to real-world scenarios. In this paper, we introduce a visual hallucination framework, called VALHALLA, which requires only source sentences at inference time and instead uses hallucinated visual representations for multimodal machine translation. In particular, given a source sentence an autoregressive hallucination transformer is used to predict a discrete visual representation from the input text, and the combined text and hallucinated representations are utilized to obtain the target translation. We train the hallucination transformer jointly with the translation transformer using standard backpropagation with cross-entropy losses while being guided by an additional loss that encourages consistency between predictions using either ground-truth or hallucinated visual representations. Extensive experiments on three standard translation datasets with a diverse set of language pairs demonstrate the effectiveness of our approach over both text-only baselines and state-of-the-art methods. Project page: http://www.svcl.ucsd.edu/projects/valhalla.

READ FULL TEXT
research
11/03/2019

Machine Translation in Pronunciation Space

The research in machine translation community focus on translation in te...
research
09/21/2020

Generative Imagination Elevates Machine Translation

There are thousands of languages on earth, but visual perception is shar...
research
07/30/2018

Doubly Attentive Transformer Machine Translation

In this paper a doubly attentive transformer machine translation model (...
research
03/19/2022

Neural Machine Translation with Phrase-Level Universal Visual Representations

Multimodal machine translation (MMT) aims to improve neural machine tran...
research
05/20/2023

Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination

In this work, we investigate a more realistic unsupervised multimodal ma...
research
05/22/2023

Improving Isochronous Machine Translation with Target Factors and Auxiliary Counters

To translate speech for automatic dubbing, machine translation needs to ...
research
08/29/2019

Probing Representations Learned by Multimodal Recurrent and Transformer Models

Recent literature shows that large-scale language modeling provides exce...

Please sign up or login with your details

Forgot password? Click here to reset