Towards Unique and Informative Captioning of Images

09/08/2020
by   Zeyu Wang, et al.
7

Despite considerable progress, state of the art image captioning models produce generic captions, leaving out important image details. Furthermore, these systems may even misrepresent the image in order to produce a simpler caption consisting of common concepts. In this paper, we first analyze both modern captioning systems and evaluation metrics through empirical experiments to quantify these phenomena. We find that modern captioning systems return higher likelihoods for incorrect distractor sentences compared to ground truth captions, and that evaluation metrics like SPICE can be 'topped' using simple captioning systems relying on object detectors. Inspired by these observations, we design a new metric (SPICE-U) by introducing a notion of uniqueness over the concepts generated in a caption. We show that SPICE-U is better correlated with human judgements compared to SPICE, and effectively captures notions of diversity and descriptiveness. Finally, we also demonstrate a general technique to improve any existing captioning model – by using mutual information as a re-ranking objective during decoding. Empirically, this results in more unique and informative captions, and improves three different state-of-the-art models on SPICE-U as well as average score over existing metrics.

READ FULL TEXT

page 2

page 14

research
03/28/2019

Describing like humans: on diversity in image captioning

Recently, the state-of-the-art models for image captioning have overtake...
research
08/08/2022

Distinctive Image Captioning via CLIP Guided Group Optimization

Image captioning models are usually trained according to human annotated...
research
05/12/2022

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

While there have been significant gains in the field of automated video ...
research
12/01/2016

Improved Image Captioning via Policy Gradient optimization of SPIDEr

Current image captioning methods are usually trained via (penalized) max...
research
06/04/2023

ROME: Testing Image Captioning Systems via Recursive Object Melting

Image captioning (IC) systems aim to generate a text description of the ...
research
06/12/2015

Technical Report: Image Captioning with Semantically Similar Images

This report presents our submission to the MS COCO Captioning Challenge ...
research
02/06/2018

Multimodal Image Captioning for Marketing Analysis

Automatically captioning images with natural language sentences is an im...

Please sign up or login with your details

Forgot password? Click here to reset