Chittron: An Automatic Bangla Image Captioning System

09/02/2018
by   Motiur Rahman, et al.
0

Automatic image caption generation aims to produce an accurate description of an image in natural language automatically. However, Bangla, the fifth most widely spoken language in the world, is lagging considerably in the research and development of such domain. Besides, while there are many established data sets to related to image annotation in English, no such resource exists for Bangla yet. Hence, this paper outlines the development of "Chittron", an automatic image captioning system in Bangla. Moreover, to address the data set availability issue, a collection of 16,000 Bangladeshi contextual images has been accumulated and manually annotated in Bangla. This data set is then used to train a model which integrates a pre-trained VGG16 image embedding model with stacked LSTM layers. The model is trained to predict the caption when the input is an image, one word at a time. The results show that the model has successfully been able to learn a working language model and to generate captions of images quite accurately in many cases. The results are evaluated mainly qualitatively. However, BLEU scores are also reported. It is expected that a better result can be obtained with a bigger and more varied data set.

READ FULL TEXT
research
10/12/2016

Generating captions without looking beyond objects

This paper explores new evaluation perspectives for image captioning and...
research
04/29/2020

Pragmatic Issue-Sensitive Image Captioning

Image captioning systems have recently improved dramatically, but they s...
research
11/17/2020

Structural and Functional Decomposition for Personality Image Captioning in a Communication Game

Personality image captioning (PIC) aims to describe an image with a natu...
research
05/25/2022

Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset

Research in massively multilingual image captioning has been severely ha...
research
10/26/2016

Automatic measurement of vowel duration via structured prediction

A key barrier to making phonetic studies scalable and replicable is the ...
research
03/27/2023

Graph Sequence Learning for Premise Selection

Premise selection is crucial for large theory reasoning as the sheer siz...
research
05/18/2022

It Isn't Sh!tposting, It's My CAT Posting

In this paper, we describe a novel architecture which can generate hilar...

Please sign up or login with your details

Forgot password? Click here to reset