Synthetic data generation for Indic handwritten text recognition

by   Partha Pratim Roy, et al.

This paper presents a novel approach to generate synthetic dataset for handwritten word recognition systems. It is difficult to recognize handwritten scripts for which sufficient training data is not readily available or it may be expensive to collect such data. Hence, it becomes hard to train recognition systems owing to lack of proper dataset. To overcome such problems, synthetic data could be used to create or expand the existing training dataset to improve recognition performance. Any available digital data from online newspaper and such sources can be used to generate synthetic data. In this paper, we propose to add distortion/deformation to digital data in such a way that the underlying pattern is preserved, so that the image so produced bears a close similarity to actual handwritten samples. The images thus produced can be used independently to train the system or be combined with natural handwritten data to augment the original dataset and improve the recognition system. We experimented using synthetic data to improve the recognition accuracy of isolated characters and words. The framework is tested on 2 Indic scripts - Devanagari (Hindi) and Bengali (Bangla), for numeral, character and word recognition. We have obtained encouraging results from the experiment. Finally, the experiment with Latin text verifies the utility of the approach.


page 14

page 21

page 22

page 24

page 27

page 30


Generating Synthetic Data for Text Recognition

Generating synthetic images is an art which emulates the natural process...

Unsupervised Writer Adaptation for Synthetic-to-Real Handwritten Word Recognition

Handwritten Text Recognition (HTR) is still a challenging problem becaus...

Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

In this work we present a framework for the recognition of natural scene...

Fonts-2-Handwriting: A Seed-Augment-Train framework for universal digit classification

In this paper, we propose a Seed-Augment-Train/Transfer (SAT) framework ...

A Scalable Handwritten Text Recognition System

Many studies on (Offline) Handwritten Text Recognition (HTR) systems hav...

Cross-language Framework for Word Recognition and Spotting of Indic Scripts

Handwritten word recognition and spotting of low-resource scripts are di...

Handwritten Stenography Recognition and the LION Dataset

Purpose: In this paper, we establish a baseline for handwritten stenogra...

Please sign up or login with your details

Forgot password? Click here to reset