Thomas Messi Nguelé ; Armel Jacques Nzekon Nzeko'o ; Damase Donald Onana - Parallelization of Recurrent Neural Network training algorithm with implicit aggregation on multi-core architectures

arima:13400 - Revue Africaine de Recherche en Informatique et Mathématiques Appliquées, 11 février 2025, Volume 42 - Numéro spécial CRI 2023 - 2024 - https://doi.org/10.46298/arima.13400
Parallelization of Recurrent Neural Network training algorithm with implicit aggregation on multi-core architecturesArticle

Auteurs : Thomas Messi Nguelé 1,2,3,4; Armel Jacques Nzekon Nzeko'o 1,3,4; Damase Donald Onana 1,4

  • 1 University of Yaoundé [Cameroun]
  • 2 University of Ebolowa
  • 3 Unité de modélisation mathématique et informatique des systèmes complexes [Bondy]
  • 4 University of Yaoundé 1 = Université de Yaoundé I

Recent work has shown that deep learning algorithms are efficient for various tasks, whether in Natural Language Processing (NLP) or in Computer Vision (CV). One of the particularities of these algorithms is that they are so efficient as the amount of data used is large. However, sequential execution of these algorithms on large amounts of data can take a very long time. In this paper, we consider the problem of training Recurrent Neural Network (RNN) for hate (aggressive) messages detection task. We first compared the sequential execution of three variants of RNN, we have shown that Long Short Time Memory (LSTM) provides better metric performance, but implies more important execution time in comparison with Gated Recurrent Unit (GRU) and standard RNN. To have both good metric performance and reduced execution time, we proceeded to a parallel implementation of the training algorithms. We proposed a parallel algorithm based on an implicit aggregation strategy in comparison to the existing approach which is based on a strategy with an aggregation function. We have shown that the convergence of this proposed parallel algorithm is close to that of the sequential algorithm. The experimental results on an 32-core machine at 1.5 GHz and 62 Go of RAM show that better results are obtained with the parallelization strategy that we proposed. For example, with an LSTM on a dataset having more than 100k comments, we obtained an f-measure of 0.922 and a speedup of 7 with our approach, compared to a f-measure of 0.874 and a speedup of 5 with an explicit aggregation between workers.


Volume : Volume 42 - Numéro spécial CRI 2023 - 2024
Publié le : 11 février 2025
Accepté le : 28 novembre 2024
Soumis le : 12 avril 2024
Mots-clés : Deep Learning,Recurrent Neural Network,hateful messages recognition,parallel programming,[INFO]Computer Science [cs],[INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC],[STAT.ML]Statistics [stat]/Machine Learning [stat.ML]

Statistiques de consultation

Cette page a été consultée 37 fois.
Le PDF de cet article a été téléchargé 13 fois.