Comparaison d'ensembles d'arbres à l'aide de descripteurs vectoriels basés sur les modèles de Markov cachés

Sylvain Iloga

doi:10.46298/arima.9107

Sylvain Iloga - Comparaison d'ensembles d'arbres à l'aide de descripteurs vectoriels basés sur les modèles de Markov cachés

arima:9107 - Revue Africaine de Recherche en Informatique et Mathématiques Appliquées, 23 août 2022, Volume 36 - Numéro spécial CRI 2021 - 2022 - https://doi.org/10.46298/arima.9107

Comparaison d'ensembles d'arbres à l'aide de descripteurs vectoriels basés sur les modèles de Markov cachésArticle

Auteurs : Sylvain Iloga ^1,^2,^3,⁴

1 Higher Teachers' Training College, University of Maroua, [ETIS - UMR 8051]
2 Unité de modélisation mathématique et informatique des systèmes complexes [Bondy]
3 Equipes Traitement de l'Information et Systèmes
4 Higher Teachers' Training College, University of Maroua

Trees are among the most studied data structures and several techniques have consequently been developed for comparing two trees belonging to the same category. Until the end of year 2020, there was a serious lack of suitable metrics for comparing two weighted trees or two trees from different categories. The problem of comparing two tree sets was not also specifically addressed. These limitations have been overcome in a paper published in 2021 where a customizable metric based on hidden Markov models has been proposed for comparing two tree sets, each containing a mixture of trees belonging to various categories. Unfortunately, that metric does not allow the use of non metric-dependent classifiers which take descriptor vectors as inputs. This paper addresses this drawback by deriving a descriptor vector for each tree set using meta-information related to its corresponding models. The comparison between two tree sets is then realized by comparing their associated descriptor vectors. Classification experiments carried out on the databases FirstLast-L (FL), FirstLast-LW (FLW) and Stanford Sentiment Treebank (SSTB) respectively showed best accuracies of 99.75%, 99.75% and 87.22%. These performances are respectively 40.75% and 20.52% better than the tree Edit distance respectively for FLW and SSTB. Additional clustering experiments exhibited 54.25%, 98.75% and 75.53% of correctly clustered instances for FL, FLW and SSTB. No clustering was performed in existing work.

https://doi.org/10.46298/arima.9107

Source : HAL:hal-03582092v3

Volume : Volume 36 - Numéro spécial CRI 2021 - 2022

Publié le : 23 août 2022

Accepté le : 26 juillet 2022

Soumis le : 21 février 2022

Mots-clés : [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], [INFO.INFO-MO]Computer Science [cs]/Modeling and Simulation, [en] Trees, Comparison of tree sets, Distance between trees, Hidden Markov Models, Classification, Clustering, Tree Edit distance

Sylvain Iloga - Comparaison d'ensembles d'arbres à l'aide de descripteurs vectoriels basés sur les modèles de Markov cachés

Références bibliographiques

Partager et exporter

Statistiques de consultation