Marwa Naili ; Anja Chaibi ; Henda Ghézala - Arabic topic identification based on empirical studies of topic models

arima:3102 - Revue Africaine de Recherche en Informatique et Mathématiques Appliquées, August 3, 2017, Volume 27 - 2017 - Special issue CARI 2016 - https://doi.org/10.46298/arima.3102
Arabic topic identification based on empirical studies of topic models

Authors: Marwa Naili ; Anja Chaibi ; Henda Ghézala

    This paper focuses on the topic identification for the Arabic language based on topic models. We study the Latent Dirichlet Allocation (LDA) as an unsupervised method for the Arabic topic identification. Thus, a deep study of LDA is carried out at two levels: Stemming process and the choice of LDA hyper-parameters. For the first level, we study the effect of different Arabic stemmers on LDA. For the second level, we focus on LDA hyper-parameters α and β and their impact on the topic identification. This study shows that LDA is an efficient method for Arabic topic identification especially with the right choice of hyper-parameters. Another important result is the high impact of the stemming algorithm on topic identification.


    Volume: Volume 27 - 2017 - Special issue CARI 2016
    Published on: August 3, 2017
    Accepted on: August 3, 2017
    Submitted on: August 2, 2017
    Keywords: Arabic stemmers,Latent Dirichlet Allocation,Topic identification,LDA hyper- parameters α and β,Identification thématique,Topic models,Allocation de Dirichlet Latente,hyper-paramètres α et β de LDA,lemmatiseurs Arabes,ACM : I.2.7.6,[INFO.INFO-TT] Computer Science [cs]/Document and Text Processing

    4 Documents citing this article

    Share

    Consultation statistics

    This page has been seen 536 times.
    This article's PDF has been downloaded 1100 times.