Line and Word Matching in Old Documents

12/17/2004
by   A. Marcolino, et al.
0

This paper is concerned with the problem of establishing an index based on word matching. It is assumed that the book was digitised as better as possible and some pre-processing techniques were already applied as line orientation correction and some noise removal. However two main factor are responsible for being not possible to apply ordinary optical character recognition techniques (OCR): the presence of antique fonts and the degraded state of many characters due to unrecoverable original time degradation. In this paper we make a short introduction to word segmentation that involves finding the lines that characterise a word. After we discuss different approaches for word matching and how they can be combined to obtain an ordered list for candidate words for the matching. This discussion will be illustrated by examples.

READ FULL TEXT
research
12/05/2017

Zone-based Keyword Spotting in Bangla and Devanagari Documents

In this paper we present a word spotting system in text lines for offlin...
research
06/07/2012

Off-Line Arabic Handwriting Character Recognition Using Word Segmentation

The ultimate aim of handwriting recognition is to make computers able to...
research
10/25/2014

A Framework for On-Line Devanagari Handwritten Character Recognition

The main challenge in on-line handwritten character recognition in India...
research
11/21/2014

Pre-processing of Domain Ontology Graph Generation System in Punjabi

This paper describes pre-processing phase of ontology graph generation s...
research
08/28/2013

Text recognition in both ancient and cartographic documents

This paper deals with the recognition and matching of text in both carto...
research
07/02/2020

Automatic Page Segmentation Without Decompressing the Run-Length Compressed Text Documents

Page segmentation is considered to be the crucial stage for the automati...

Please sign up or login with your details

Forgot password? Click here to reset