Ming-Wei Lin ; Jules-Raymond Tapamo ; Baird Ndovie - A Texture-based Method for Document Segmentation and Classification

arima:1878 - Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées, October 15, 2007, Volume 6, april 2007, joint Special Issue ARIMA/SACJ on Advances in end-user data mining techniques - https://doi.org/10.46298/arima.1878
A Texture-based Method for Document Segmentation and Classification

Authors: Ming-Wei Lin ; Jules-Raymond Tapamo ; Baird Ndovie

In this paper we present a hybrid approach to segment and classify contents of document images. A Document Image is segmented into three types of regions: Graphics, Text and Space. The image of a document is subdivided into blocks and for each block five GLCM (Grey Level Co-occurrence Matrix) features are extracted. Based on these features, blocks are then clustered into three groups using K-Means algorithm; connected blocks that belong to the same group are merged. The classification of groups is done using pre-learned heuristic rules. Experiments were conducted on scanned newspapers and images from MediaTeam Document Database


Volume: Volume 6, april 2007, joint Special Issue ARIMA/SACJ on Advances in end-user data mining techniques
Published on: October 15, 2007
Submitted on: March 27, 2007
Keywords: Information Retrieval, Texture segmentation, Document Image Analysis, Feature extraction, Grey Level Co-occurrence Matrix(GLCM), K-Means Clustering,Extraction de Caractéristiques,Regroupement K-means,Matrice de Co-occurrence,Segmentation de Texture,Analyse d'images de documents,Recherche d'Information,[INFO] Computer Science [cs],[MATH] Mathematics [math]


Share

Consultation statistics

This page has been seen 93 times.
This article's PDF has been downloaded 210 times.