Towards Multi-Lingual Visual Question Answering

09/12/2022
by   Soravit Changpinyo, et al.
0

Visual Question Answering (VQA) has been primarily studied through the lens of the English language. Yet, tackling VQA in other languages in the same manner would require considerable amount of resources. In this paper, we propose scalable solutions to multi-lingual visual question answering (mVQA), on both data and modeling fronts. We first propose a translation-based framework to mVQA data generation that requires much less human annotation efforts than the conventional approach of directly collection questions and answers. Then, we apply our framework to the multi-lingual captions in the Crossmodal-3600 dataset and develop an efficient annotation protocol to create MAVERICS-XM3600 (MaXM), a test-only VQA benchmark in 7 diverse languages. Finally, we propose an approach to unified, extensible, open-ended, and end-to-end mVQA modeling and demonstrate strong performance in 13 languages.

READ FULL TEXT

page 1

page 4

page 5

research
09/21/2017

Visual Question Generation as Dual Task of Visual Question Answering

Recently visual question answering (VQA) and visual question generation ...
research
10/24/2020

Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions

Visual Question Answering is a multi-modal task that aims to measure hig...
research
12/15/2021

3D Question Answering

Visual Question Answering (VQA) has witnessed tremendous progress in rec...
research
05/07/2023

OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese

In recent years, visual question answering (VQA) has attracted attention...
research
01/24/2018

DVQA: Understanding Data Visualizations via Question Answering

Bar charts are an effective way for humans to convey information to each...
research
07/02/2020

Project PIAF: Building a Native French Question-Answering Dataset

Motivated by the lack of data for non-English languages, in particular f...
research
10/08/2019

Modulated Self-attention Convolutional Network for VQA

As new data-sets for real-world visual reasoning and compositional quest...

Please sign up or login with your details

Forgot password? Click here to reset