A Semi-supervised Framework for Image Captioning

11/16/2016
by   Wenhu Chen, et al.
0

State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images, which is a much more abundant commodity. We here propose a novel way of using such textual data by artificially generating missing visual information. We evaluate this learning approach on a newly designed model that detects visual concepts present in an image and feed them to a reviewer-decoder architecture with an attention mechanism. Unlike previous approaches that encode visual concepts using word embeddings, we instead suggest using regional image features which capture more intrinsic information. The main benefit of this architecture is that it synthesizes meaningful thought vectors that capture salient image properties and then applies a soft attentive decoder to decode the thought vectors and generate image captions. We evaluate our model on both Microsoft COCO and Flickr30K datasets and demonstrate that this model combined with our semi-supervised learning method can largely improve performance and help the model to generate more accurate and diverse captions.

READ FULL TEXT

page 4

page 13

page 14

page 15

page 16

page 17

research
09/05/2019

Image Captioning with Very Scarce Supervised Data: Adversarial Semi-Supervised Learning Approach

Constructing an organized dataset comprised of a large number of images ...
research
06/21/2021

TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning

Existing research for image captioning usually represents an image using...
research
01/26/2023

Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data

We present a novel data-efficient semi-supervised framework to improve t...
research
05/28/2023

FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions

Image captioning is a central task in computer vision which has experien...
research
09/18/2020

Image Captioning with Attention for Smart Local Tourism using EfficientNet

Smart systems have been massively developed to help humans in various ta...
research
11/02/2020

Boost Image Captioning with Knowledge Reasoning

Automatically generating a human-like description for a given image is a...
research
08/27/2018

simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions

The encode-decoder framework has shown recent success in image captionin...

Please sign up or login with your details

Forgot password? Click here to reset