Ming-Wei Lin ; Jules-Raymond Tapamo ; Baird Ndovie
-
A Texture-based Method for Document Segmentation and Classification
arima:1878 -
Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées,
October 15, 2007,
Volume 6, april 2007, joint Special Issue ARIMA/SACJ on Advances in end-user data mining techniques
-
https://doi.org/10.46298/arima.1878
A Texture-based Method for Document Segmentation and Classification
Authors: Ming-Wei Lin ; Jules-Raymond Tapamo ; Baird Ndovie
In this paper we present a hybrid approach to segment and classify contents of document images. A Document Image is segmented into three types of regions: Graphics, Text and Space. The image of a document is subdivided into blocks and for each block five GLCM (Grey Level Co-occurrence Matrix) features are extracted. Based on these features, blocks are then clustered into three groups using K-Means algorithm; connected blocks that belong to the same group are merged. The classification of groups is done using pre-learned heuristic rules. Experiments were conducted on scanned newspapers and images from MediaTeam Document Database