Explain Images with Multimodal Recurrent Neural Networks

10/04/2014
by   Junhua Mao, et al.
0

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models the probability distribution of generating a word given previous words and the image. Image descriptions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on three benchmark datasets (IAPR TC-12, Flickr 8K, and Flickr 30K). Our model outperforms the state-of-the-art generative method. In addition, the m-RNN model can be applied to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval.

READ FULL TEXT
research
12/20/2014

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) ...
research
12/07/2014

Deep Visual-Semantic Alignments for Generating Image Descriptions

We present a model that generates natural language descriptions of image...
research
02/05/2016

Generate Image Descriptions based on Deep RNN and Memory Cells for Images Features

Generating natural language descriptions for images is a challenging tas...
research
03/23/2017

Recurrent Multimodal Interaction for Referring Image Segmentation

In this paper we are interested in the problem of image segmentation giv...
research
02/13/2019

Recurrent Neural Networks with Stochastic Layers for Acoustic Novelty Detection

In this paper, we adapt Recurrent Neural Networks with Stochastic Layers...
research
11/03/2015

Detecting Interrogative Utterances with Recurrent Neural Networks

In this paper, we explore different neural network architectures that ca...
research
02/20/2020

Expressing Objects just like Words: Recurrent Visual Embedding for Image-Text Matching

Existing image-text matching approaches typically infer the similarity o...

Please sign up or login with your details

Forgot password? Click here to reset