Genetic Programming for Document Segmentation and Region Classification Using Discipulus
Document segmentation is a method of rending the document into distinct regions. A document is an assortment of information and a standard mode of conveying information to others. Pursuance of data from documents involves ton of human effort, time intense and might severely prohibit the usage of data systems. So, automatic information pursuance from the document has become a big issue. It is been shown that document segmentation will facilitate to beat such problems. This paper proposes a new approach to segment and classify the document regions as text, image, drawings and table. Document image is divided into blocks using Run length smearing rule and features are extracted from every blocks. Discipulus tool has been used to construct the Genetic programming based classifier model and located 97.5
READ FULL TEXT