Show, Edit and Tell: A Framework for Editing Image Captions

03/06/2020
by   Fawaz Sammani, et al.
0

Most image captioning frameworks generate captions directly from images, learning a mapping from visual features to natural language. However, editing existing captions can be easier than generating new ones from scratch. Intuitively, when editing captions, a model is not required to learn information that is already present in the caption (i.e. sentence structure), enabling it to focus on fixing details (e.g. replacing repetitive words). This paper proposes a novel approach to image captioning based on iterative adaptive refinement of an existing caption. Specifically, our caption-editing model consisting of two sub-modules: (1) EditNet, a language module with an adaptive copy mechanism (Copy-LSTM) and a Selective Copy Memory Attention mechanism (SCMA), and (2) DCNet, an LSTM-based denoising auto-encoder. These components enable our model to directly copy from and modify existing captions. Experiments demonstrate that our new approach achieves state-of-art performance on the MS COCO dataset both with and without sequence-level training.

READ FULL TEXT

page 4

page 8

09/07/2019

Look and Modify: Modification Networks for Image Captioning

Attention-based neural encoder-decoder frameworks have been widely used ...
07/20/2022

Explicit Image Caption Editing

Given an image and a reference caption, the image caption editing task a...
12/14/2020

Intrinsic Image Captioning Evaluation

The image captioning task is about to generate suitable descriptions fro...
06/29/2021

Contrastive Semantic Similarity Learning for Image Captioning Evaluation with Intrinsic Auto-encoder

Automatically evaluating the quality of image captions can be very chall...
06/18/2019

Expressing Visual Relationships via Language

Describing images with text is a fundamental problem in vision-language ...
02/23/2021

Enhanced Modality Transition for Image Captioning

Image captioning model is a cross-modality knowledge discovery task, whi...
11/02/2020

Boost Image Captioning with Knowledge Reasoning

Automatically generating a human-like description for a given image is a...

Code Repositories

show-edit-tell

Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020


view repo

Please sign up or login with your details

Forgot password? Click here to reset