Show, Edit and Tell: A Framework for Editing Image Captions

by   Fawaz Sammani, et al.

Most image captioning frameworks generate captions directly from images, learning a mapping from visual features to natural language. However, editing existing captions can be easier than generating new ones from scratch. Intuitively, when editing captions, a model is not required to learn information that is already present in the caption (i.e. sentence structure), enabling it to focus on fixing details (e.g. replacing repetitive words). This paper proposes a novel approach to image captioning based on iterative adaptive refinement of an existing caption. Specifically, our caption-editing model consisting of two sub-modules: (1) EditNet, a language module with an adaptive copy mechanism (Copy-LSTM) and a Selective Copy Memory Attention mechanism (SCMA), and (2) DCNet, an LSTM-based denoising auto-encoder. These components enable our model to directly copy from and modify existing captions. Experiments demonstrate that our new approach achieves state-of-art performance on the MS COCO dataset both with and without sequence-level training.


page 4

page 8


Look and Modify: Modification Networks for Image Captioning

Attention-based neural encoder-decoder frameworks have been widely used ...

Explicit Image Caption Editing

Given an image and a reference caption, the image caption editing task a...

Intrinsic Image Captioning Evaluation

The image captioning task is about to generate suitable descriptions fro...

Contrastive Semantic Similarity Learning for Image Captioning Evaluation with Intrinsic Auto-encoder

Automatically evaluating the quality of image captions can be very chall...

Expressing Visual Relationships via Language

Describing images with text is a fundamental problem in vision-language ...

Enhanced Modality Transition for Image Captioning

Image captioning model is a cross-modality knowledge discovery task, whi...

Boost Image Captioning with Knowledge Reasoning

Automatically generating a human-like description for a given image is a...

Code Repositories


Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020

view repo

Please sign up or login with your details

Forgot password? Click here to reset