IMAGETOTEXT: IMAGE CAPTION GENERATION USING HYBRID RECURRENT NEURAL NETWORK

02/25/2021
by   md-asifuzzaman-jishan, et al.
0

Generating a natural language description from images is an important problem at the section of computer vision, natural language processing, artificial intelligence and image processing. Observing many recent works in deep learning sector, we introduced a hybrid RNN model which is generating text from the given input images. We presented the learning model that generates natural language of images. The model utilized the connections between natural language and visual data by produced text line based contents from a given image. Our Hybrid Recurrent Neural Network model is based on the combination of Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) and Bi-directional Recurrent Neural Network (BRNN) models. We used three benchmark datasets: Flickr8K, Flickr30K and MS COCO for training our model and observed the accuracy improvement comparing with the state of the art work. A new Bangla dataset is also created which we named as BNLIT (Bangla Natural Language Image to Text) is made to generate Bangla caption from given input image. This dataset contains 8,700 images and all the images are in Bangladesh perspective images. Our hybrid model learns from a new set of data and annotations that reflect the Bangladeshi geographical context.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro