Captioning Images with Diverse Objects

06/24/2016
by   Subhashini Venugopalan, et al.
0

Recent captioning models are limited in their ability to scale and describe concepts unseen in paired image-text corpora. We propose the Novel Object Captioner (NOC), a deep visual semantic captioning model that can describe a large number of object categories not present in existing image-caption datasets. Our model takes advantage of external sources -- labeled images from object recognition datasets, and semantic knowledge extracted from unannotated text. We propose minimizing a joint objective which can learn from these diverse data sources and leverage distributional semantic embeddings, enabling the model to generalize and describe novel objects outside of image-caption datasets. We demonstrate that our model exploits semantic information to generate captions for hundreds of object categories in the ImageNet object recognition dataset that are not observed in MSCOCO image-caption training data, as well as many categories that are observed very rarely. Both automatic evaluations and human judgements show that our model considerably outperforms prior work in being able to describe many more categories of objects.

READ FULL TEXT

page 8

page 10

page 11

page 12

page 13

page 14

page 15

research
11/17/2015

Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data

While recent deep neural network models have achieved promising results ...
research
10/17/2017

Describing Natural Images Containing Novel Objects with Knowledge Guided Assitance

Images in the wild encapsulate rich knowledge about varied abstract conc...
research
01/29/2019

Semantic Redundancies in Image-Classification Datasets: The 10 Don't Need

Large datasets have been crucial to the success of deep learning models ...
research
01/29/2023

Diverse, Difficult, and Odd Instances (D2O): A New Test Set for Object Classification

Test sets are an integral part of evaluating models and gauging progress...
research
12/19/2016

Few-Shot Object Recognition from Machine-Labeled Web Images

With the tremendous advances of Convolutional Neural Networks (ConvNets)...
research
03/28/2022

NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge

Novel object captioning aims at describing objects absent from training ...
research
12/15/2022

Objaverse: A Universe of Annotated 3D Objects

Massive data corpora like WebText, Wikipedia, Conceptual Captions, WebIm...

Please sign up or login with your details

Forgot password? Click here to reset