BanglaWriting: A multi-purpose offline Bangla handwriting dataset

11/15/2020
by   M. F. Mridha, et al.
0

This article presents a Bangla handwriting dataset named BanglaWriting that contains single-page handwritings of 260 individuals of different personalities and ages. Each page includes bounding-boxes that bounds each word, along with the unicode representation of the writing. This dataset contains 21,234 words and 32,787 characters in total. Moreover, this dataset includes 5,470 unique words of Bangla vocabulary. Apart from the usual words, the dataset comprises 261 comprehensible overwriting and 450 incomprehensible overwriting. All of the bounding boxes and word labels are manually-generated. The dataset can be used for complex optical character/word recognition, writer identification, and handwritten word segmentation. Furthermore, this dataset is suitable for extracting age-based and gender-based variation of handwriting.

READ FULL TEXT

page 4

page 5

page 6

research
07/05/2017

R-PHOC: Segmentation-Free Word Spotting using CNN

This paper proposes a region based convolutional neural network for segm...
research
02/22/2017

BanglaLekha-Isolated: A Comprehensive Bangla Handwritten Character Dataset

Bangla handwriting recognition is becoming a very important issue nowada...
research
12/25/2019

DDI-100: Dataset for Text Detection and Recognition

Nowadays document analysis and recognition remain challenging tasks. How...
research
03/17/2017

Construction of a Japanese Word Similarity Dataset

An evaluation of distributed word representation is generally conducted ...
research
10/01/2020

Multi-label Classification of Common Bengali Handwritten Graphemes: Dataset and Challenge

Latin has historically led the state-of-the-art in handwritten optical c...
research
05/11/2023

Combining OCR Models for Reading Early Modern Printed Books

In this paper, we investigate the usage of fine-grained font recognition...
research
10/02/2021

BdSL36: A Dataset for Bangladeshi Sign Letters Recognition

Bangladeshi Sign Language (BdSL) is a commonly used medium of communicat...

Please sign up or login with your details

Forgot password? Click here to reset