VITAL: A Visual Interpretation on Text with Adversarial Learning for Image Labeling

07/26/2019
by   Tao Hu, et al.
7

In this paper, we propose a novel way to interpret text information by extracting visual feature presentation from multiple high-resolution and photo-realistic synthetic images generated by Text-to-image Generative Adversarial Network (GAN) to improve the performance of image labeling. Firstly, we design a stacked Generative Multi-Adversarial Network (GMAN), StackGMAN++, a modified version of the current state-of-the-art Text-to-image GAN, StackGAN++, to generate multiple synthetic images with various prior noises conditioned on a text. And then we extract deep visual features from the generated synthetic images to explore the underlying visual concepts for text. Finally, we combine image-level visual feature, text-level feature and visual features based on synthetic images together to predict labels for images. We conduct experiments on two benchmark datasets and the experimental results clearly demonstrate the efficacy of our proposed approach.

READ FULL TEXT

page 1

page 6

page 8

research
10/19/2017

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

Although Generative Adversarial Networks (GANs) have shown remarkable su...
research
05/25/2019

Beyond Visual Semantics: Exploring the Role of Scene Text in Image Understanding

Images with visual and scene text content are ubiquitous in everyday lif...
research
10/10/2011

Closed-Loop Learning of Visual Control Policies

In this paper we present a general, flexible framework for learning mapp...
research
03/04/2021

Robustness Evaluation of Stacked Generative Adversarial Networks using Metamorphic Testing

Synthesising photo-realistic images from natural language is one of the ...
research
03/09/2023

Visualizing Semiotics in Generative Adversarial Networks

We perform a set of experiments to demonstrate that images generated usi...
research
05/18/2019

Variational Hetero-Encoder Randomized Generative Adversarial Networks for Joint Image-Text Modeling

For bidirectional joint image-text modeling, we develop variational hete...
research
09/21/2020

MFIF-GAN: A New Generative Adversarial Network for Multi-Focus Image Fusion

Multi-Focus Image Fusion (MFIF) is one of the promising techniques to ob...

Please sign up or login with your details

Forgot password? Click here to reset