Better Captioning with Sequence-Level Exploration

03/08/2020
by   Jia Chen, et al.
0

Sequence-level learning objective has been widely used in captioning tasks to achieve the state-of-the-art performance for many models. In this objective, the model is trained by the reward on the quality of its generated captions (sequence-level). In this work, we show the limitation of the current sequence-level learning objective for captioning tasks from both theory and empirical result. In theory, we show that the current objective is equivalent to only optimizing the precision side of the caption set generated by the model and therefore overlooks the recall side. Empirical result shows that the model trained by this objective tends to get lower score on the recall side. We propose to add a sequence-level exploration term to the current objective to boost recall. It guides the model to explore more plausible captions in the training. In this way, the proposed objective takes both the precision and recall sides of generated captions into account. Experiments show the effectiveness of the proposed method on both video and image captioning datasets.

READ FULL TEXT

page 1

page 4

page 8

research
03/12/2018

Discriminability objective for training descriptive captions

One property that remains lacking in image captions generated by contemp...
research
02/02/2023

IC^3: Image Captioning by Committee Consensus

If you ask a human to describe an image, they might do so in a thousand ...
research
07/31/2023

Guiding Image Captioning Models Toward More Specific Captions

Image captioning is conventionally formulated as the task of generating ...
research
09/27/2018

Vector Learning for Cross Domain Representations

Recently, generative adversarial networks have gained a lot of popularit...
research
05/13/2022

Joint Generation of Captions and Subtitles with Dual Decoding

As the amount of audio-visual content increases, the need to develop aut...
research
04/06/2020

B-SCST: Bayesian Self-Critical Sequence Training for Image Captioning

Bayesian deep neural networks (DNN) provide a mathematically grounded fr...
research
07/31/2019

ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a Mapping from Parts Detected in Multiple Views to Sentences

3D shape captioning is a challenging application in 3D shape understandi...

Please sign up or login with your details

Forgot password? Click here to reset