Leveraging Visual Question Answering to Improve Text-to-Image Synthesis

10/28/2020
by   Stanislav Frolov, et al.
1

Generating images from textual descriptions has recently attracted a lot of interest. While current models can generate photo-realistic images of individual objects such as birds and human faces, synthesising images with multiple objects is still very difficult. In this paper, we propose an effective way to combine Text-to-Image (T2I) synthesis with Visual Question Answering (VQA) to improve the image quality and image-text alignment of generated images by leveraging the VQA 2.0 dataset. We create additional training samples by concatenating question and answer (QA) pairs and employ a standard VQA model to provide the T2I model with an auxiliary learning signal. We encourage images generated from QA pairs to look realistic and additionally minimize an external VQA loss. Our method lowers the FID from 27.84 to 25.38 and increases the R-prec. from 83.82 which indicates that T2I synthesis can successfully be improved using a standard VQA model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2022

Aesthetic Visual Question Answering of Photographs

Aesthetic assessment of images can be categorized into two main forms: n...
research
08/29/2018

From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks

In this work, we present novel methods to adapt visual QA models for com...
research
10/29/2019

Learning Rich Image Region Representation for Visual Question Answering

We propose to boost VQA by leveraging more powerful feature extractors b...
research
08/03/2022

TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation

Text-VQA aims at answering questions that require understanding the text...
research
11/14/2019

Question-Conditioned Counterfactual Image Generation for VQA

While Visual Question Answering (VQA) models continue to push the state-...
research
03/31/2021

Analysis on Image Set Visual Question Answering

We tackle the challenge of Visual Question Answering in multi-image sett...
research
01/23/2023

HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial Images

Visual question answering (VQA) is an important and challenging multimod...

Please sign up or login with your details

Forgot password? Click here to reset