Transformer-based Cascaded Multimodal Speech Translation

10/29/2019
by   Zixiu Wu, et al.
0

This paper describes the cascaded multimodal speech translation systems developed by Imperial College London for the IWSLT 2019 evaluation campaign. The architecture consists of an automatic speech recognition (ASR) system followed by a Transformer-based multimodal machine translation (MMT) system. While the ASR component is identical across the experiments, the MMT model varies in terms of the way of integrating the visual context (simple conditioning vs. attention), the type of visual features exploited (pooled, convolutional, action categories) and the underlying architecture. For the latter, we explore both the canonical transformer and its deliberation version with additive and cascade variants which differ in how they integrate the textual attention. Upon conducting extensive experiments, we found that (i) the explored visual integration schemes often harm the translation performance for the transformer and additive deliberation, but considerably improve the cascade deliberation; (ii) the transformer and cascade deliberation integrate the visual modality better than the additive deliberation, as shown by the incongruence analysis.

READ FULL TEXT

page 3

page 4

research
08/09/2021

The HW-TSC's Offline Speech Translation Systems for IWSLT 2021 Evaluation

This paper describes our work in participation of the IWSLT-2021 offline...
research
07/14/2017

LIUM-CVC Submissions for WMT17 Multimodal Translation Task

This paper describes the monomodal and multimodal Neural Machine Transla...
research
02/13/2020

Looking Enhances Listening: Recovering Missing Speech Using Images

Speech is understood better by using visual context; for this reason, th...
research
11/02/2020

Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation

We introduce dual-decoder Transformer, a new model architecture that joi...
research
07/27/2023

Cascaded Cross-Modal Transformer for Request and Complaint Detection

We propose a novel cascaded cross-modal transformer (CCMT) that combines...
research
10/05/2020

Fine-Grained Grounding for Multimodal Speech Recognition

Multimodal automatic speech recognition systems integrate information fr...

Please sign up or login with your details

Forgot password? Click here to reset