Controllable Image Captioning

04/28/2022
by   Luka Maxwell, et al.
0

State-of-the-art image captioners can generate accurate sentences to describe images in a sequence to sequence manner without considering the controllability and interpretability. This, however, is far from making image captioning widely used as an image can be interpreted in infinite ways depending on the target and the context at hand. Achieving controllability is important especially when the image captioner is used by different people with different way of interpreting the images. In this paper, we introduce a novel framework for image captioning which can generate diverse descriptions by capturing the co-dependence between Part-Of-Speech tags and semantics. Our model decouples direct dependence between successive variables. In this way, it allows the decoder to exhaustively search through the latent Part-Of-Speech choices, while keeping decoding speed proportional to the size of the POS vocabulary. Given a control signal in the form of a sequence of Part-Of-Speech tags, we propose a method to generate captions through a Transformer network, which predicts words based on the input Part-Of-Speech tag sequences. Experiments on publicly available datasets show that our model significantly outperforms state-of-the-art methods on generating diverse image captions with high qualities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/26/2018

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions

Current captioning approaches can describe images using black-box archit...
research
05/31/2018

Diverse and Controllable Image Captioning with Part-of-Speech Guidance

Automatically describing an image is an important capability for virtual...
research
03/14/2019

Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning

Our goal in this work is to train an image captioning model that generat...
research
07/19/2020

Length-Controllable Image Captioning

The last decade has witnessed remarkable progress in the image captionin...
research
11/23/2016

Semantic Compositional Networks for Visual Captioning

A Semantic Compositional Network (SCN) is developed for image captioning...
research
09/07/2019

Look and Modify: Modification Networks for Image Captioning

Attention-based neural encoder-decoder frameworks have been widely used ...
research
12/02/2016

Guided Open Vocabulary Image Captioning with Constrained Beam Search

Existing image captioning models do not generalize well to out-of-domain...

Please sign up or login with your details

Forgot password? Click here to reset