UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning

06/26/2021
by   Hwanhee Lee, et al.
0

Despite the success of various text generation metrics such as BERTScore, it is still difficult to evaluate the image captions without enough reference captions due to the diversity of the descriptions. In this paper, we introduce a new metric UMIC, an Unreferenced Metric for Image Captioning which does not require reference captions to evaluate image captions. Based on Vision-and-Language BERT, we train UMIC to discriminate negative captions via contrastive learning. Also, we observe critical problems of the previous benchmark dataset (i.e., human annotations) on image captioning metric, and introduce a new collection of human annotations on the generated captions. We validate UMIC on four datasets, including our new dataset, and show that UMIC has a higher correlation than all previous metrics that require multiple references. We release the benchmark dataset and pre-trained models to compute the UMIC.

READ FULL TEXT

page 3

page 5

page 8

research
03/26/2020

Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models

Image captioning models have been able to generate grammatically correct...
research
10/06/2017

Contrastive Learning for Image Captioning

Image captioning, a popular topic in computer vision, has achieved subst...
research
06/12/2023

Scalable 3D Captioning with Pretrained Models

We introduce Cap3D, an automatic approach for generating descriptive tex...
research
03/15/2023

PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning

Vulnerability to lexical perturbation is a critical weakness of automati...
research
05/26/2022

Fine-grained Image Captioning with CLIP Reward

Modern image captioning models are usually trained with text similarity ...
research
04/30/2018

Improved Image Captioning with Adversarial Semantic Alignment

In this paper we propose a new conditional GAN for image captioning that...
research
11/19/2022

ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on Diversity over Language and Culture

This paper introduces ArtELingo, a new benchmark and dataset, designed t...

Please sign up or login with your details

Forgot password? Click here to reset