Ming-Wei Lin ; Jules-Raymond Tapamo ; Baird Ndovie
-
A Texture-based Method for Document Segmentation and Classification
arima:1878 -
Revue Africaine de Recherche en Informatique et Mathématiques Appliquées,
October 15, 2007,
Volume 6, april 2007, joint Special Issue ARIMA/SACJ on Advances in end-user data mining techniques
-
https://doi.org/10.46298/arima.1878
A Texture-based Method for Document Segmentation and ClassificationArticle
Authors: Ming-Wei Lin 1; Jules-Raymond Tapamo 1; Baird Ndovie 1
NULL##NULL##NULL
Ming-Wei Lin;Jules-Raymond Tapamo;Baird Ndovie
1 School of Computer Science
In this paper we present a hybrid approach to segment and classify contents of document images. A Document Image is segmented into three types of regions: Graphics, Text and Space. The image of a document is subdivided into blocks and for each block five GLCM (Grey Level Co-occurrence Matrix) features are extracted. Based on these features, blocks are then clustered into three groups using K-Means algorithm; connected blocks that belong to the same group are merged. The classification of groups is done using pre-learned heuristic rules. Experiments were conducted on scanned newspapers and images from MediaTeam Document Database