Exploring Semantic Relationships for Unpaired Image Captioning

06/20/2021
by   Fenglin Liu, et al.
0

Recently, image captioning has aroused great interest in both academic and industrial worlds. Most existing systems are built upon large-scale datasets consisting of image-sentence pairs, which, however, are time-consuming to construct. In addition, even for the most advanced image captioning systems, it is still difficult to realize deep image understanding. In this work, we achieve unpaired image captioning by bridging the vision and the language domains with high-level semantic information. The motivation stems from the fact that the semantic concepts with the same modality can be extracted from both images and descriptions. To further improve the quality of captions generated by the model, we propose the Semantic Relationship Explorer, which explores the relationships between semantic concepts for better understanding of the image. Extensive experiments on MSCOCO dataset show that we can generate desirable captions without paired datasets. Furthermore, the proposed approach boosts five strong baselines under the paired setting, where the most significant improvement in CIDEr score reaches 8 effective and generalizes well to a wide range of models.

READ FULL TEXT

page 1

page 4

page 9

research
03/26/2020

Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models

Image captioning models have been able to generate grammatically correct...
research
05/13/2018

Image Captioning

This paper discusses and demonstrates the outcomes from our experimentat...
research
12/15/2016

Beyond Holistic Object Recognition: Enriching Image Understanding with Part States

Important high-level vision tasks such as human-object interaction, imag...
research
06/03/2015

What value do explicit high level concepts have in vision to language problems?

Much of the recent progress in Vision-to-Language (V2L) problems has bee...
research
01/26/2023

Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data

We present a novel data-efficient semi-supervised framework to improve t...
research
02/07/2023

KENGIC: KEyword-driven and N-Gram Graph based Image Captioning

This paper presents a Keyword-driven and N-gram Graph based approach for...
research
08/27/2018

simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions

The encode-decoder framework has shown recent success in image captionin...

Please sign up or login with your details

Forgot password? Click here to reset