Marwa Naili ; Anja Chaibi ; Henda Ghézala
-
Arabic topic identification based on empirical studies of topic models
arima:3102 -
Revue Africaine de Recherche en Informatique et Mathématiques Appliquées,
August 3, 2017,
Volume 27 - 2017 - Special issue CARI 2016
-
https://doi.org/10.46298/arima.3102
Arabic topic identification based on empirical studies of topic modelsArticle
1 Laboratoire de recherche en Génie Logiciel, Applications distribuées, Systèmes décisionnels et Imagerie intelligente [Manouba]
This paper focuses on the topic identification for the Arabic language based on topic models. We study the Latent Dirichlet Allocation (LDA) as an unsupervised method for the Arabic topic identification. Thus, a deep study of LDA is carried out at two levels: Stemming process and the choice of LDA hyper-parameters. For the first level, we study the effect of different Arabic stemmers on LDA. For the second level, we focus on LDA hyper-parameters α and β and their impact on the topic identification. This study shows that LDA is an efficient method for Arabic topic identification especially with the right choice of hyper-parameters. Another important result is the high impact of the stemming algorithm on topic identification.
Volume: Volume 27 - 2017 - Special issue CARI 2016
Published on: August 3, 2017
Accepted on: July 3, 2017
Submitted on: August 2, 2017
Keywords: Topic identification,Latent Dirichlet Allocation,LDA hyper- parameters α and β,Arabic stemmers,Identification thématique,Topic models,Allocation de Dirichlet Latente,hyper-paramètres α et β de LDA,lemmatiseurs Arabes,ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.7: Natural Language Processing/I.2.7.6: Text analysis,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
Myeong Seon Lee;Hyun-Sook Chung;Jin Sun Kim, 2023, Analysis of online parenting community posts on expanded newborn screening for metabolic disorders using topic modeling: a quantitative content analysis, Korean journal of women health nursing/Yeoseong geon'gang ganho hag'hoeji/Yeoseong geon-gang ganho hakoeji, 29, 1, pp. 20-31, 10.4069/kjwhn.2023.02.21, https://doi.org/10.4069/kjwhn.2023.02.21.
Dong-Joon Jung, 2022, Political Polarization on Social Media Conversations about COVID-19 Vaccination: Evidence from the Word Network Analysis and Topic Modeling of Twitter Messages in South Korea, Journal of Social Science, 33, 2, pp. 85-123, 10.16881/jss.2022.04.33.2.85.
Mohammed A. AlGhamdi;Murtaza Ali Khan, 2020, Intelligent Analysis of Arabic Tweets for Detection of Suspicious Messages, Arabian Journal for Science and Engineering, 45, 8, pp. 6021-6032, 10.1007/s13369-020-04447-0.
Sergei Koltcov;Vera Ignatenko;Olessia Koltsova, 2019, Estimating Topic Modeling Performance with Sharma–Mittal Entropy, Entropy, 21, 7, pp. 660, 10.3390/e21070660, https://doi.org/10.3390/e21070660.