Image Captioning with Deep Bidirectional LSTMs

04/04/2016
by   Cheng Wang, et al.
0

This work presents an end-to-end trainable deep bidirectional LSTM (Long-Short Term Memory) model for image captioning. Our model builds on a deep convolutional neural network (CNN) and two separate LSTM networks. It is capable of learning long term visual-language interactions by making use of history and future context information at high level semantic space. Two novel deep bidirectional variant models, in which we increase the depth of nonlinearity transition in different way, are proposed to learn hierarchical visual-language embeddings. Data augmentation techniques such as multi-crop, multi-scale and vertical mirror are proposed to prevent overfitting in training deep models. We visualize the evolution of bidirectional LSTM internal states over time and qualitatively analyze how our models "translate" image to sentence. Our proposed models are evaluated on caption generation and image-sentence retrieval tasks with three benchmark datasets: Flickr8K, Flickr30K and MSCOCO datasets. We demonstrate that bidirectional LSTM models achieve highly competitive performance to the state-of-the-art results on caption generation even without integrating additional mechanism (e.g. object detection, attention model etc.) and significantly outperform recent methods on retrieval task.

READ FULL TEXT

page 6

page 7

page 9

research
08/17/2017

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

Image captioning often requires a large set of training image-sentence p...
research
02/22/2021

Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation

Image Captioning, or the automatic generation of descriptions for images...
research
10/15/2019

Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style

Image captioning is a research hotspot where encoder-decoder models comb...
research
02/03/2018

Densely Connected Bidirectional LSTM with Applications to Sentence Classification

Deep neural networks have recently been shown to achieve highly competit...
research
02/25/2021

Retrieval Augmentation to Improve Robustness and Interpretability of Deep Neural Networks

Deep neural network models have achieved state-of-the-art results in var...
research
03/25/2021

Improving Online Forums Summarization via Unifying Hierarchical Attention Networks with Convolutional Neural Networks

Online discussion forums are prevalent and easily accessible, thus allow...
research
12/05/2018

Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs

In practice, it is common to find oneself with far too little text data ...

Please sign up or login with your details

Forgot password? Click here to reset