Text Line Identification in Tagore's Manuscript
In this paper, a text line identification method is proposed. The text lines of printed document are easy to segment due to uniform straightness of the lines and sufficient gap between the lines. But in handwritten documents, the line is non-uniform and interline gaps are variable. We take Rabindranath Tagore's manuscript as it is one of the most difficult manuscripts that contain doodles. Our method consists of a pre-processing stage to clean the document image. Then we separate doodles from the manuscript to get the textual region. After that we identify the text lines on the manuscript. For text line identification, we use window examination, black run-length smearing, horizontal histogram and connected component analysis.
READ FULL TEXT