Parallelization of Recurrent Neural Network training algorithm with implicit aggregation on multi-core architectures

Thomas Messi Nguelé; Armel Jacques Nzekon Nzeko'o; Damase Donald Onana

doi:10.46298/arima.13400

Thomas Messi Nguelé ; Armel Jacques Nzekon Nzeko'o ; Damase Donald Onana - Parallelization of Recurrent Neural Network training algorithm with implicit aggregation on multi-core architectures

arima:13400 - Revue Africaine de Recherche en Informatique et Mathématiques Appliquées, 11 février 2025, Volume 42 - Numéro spécial CRI 2023 - 2024/2025 - https://doi.org/10.46298/arima.13400

Parallelization of Recurrent Neural Network training algorithm with implicit aggregation on multi-core architecturesArticle

Auteurs : Thomas Messi Nguelé ^1,^2,^3,⁴; Armel Jacques Nzekon Nzeko'o ^1,^3,⁴; Damase Donald Onana ^1,⁴

1 University of Yaoundé [Cameroun]
2 University of Ebolowa
3 Unité de modélisation mathématique et informatique des systèmes complexes [Bondy]
4 University of Yaoundé 1 = Université de Yaoundé I

Recent work has shown that deep learning algorithms are efficient for various tasks, whether in Natural Language Processing (NLP) or in Computer Vision (CV). One of the particularities of these algorithms is that they are so efficient as the amount of data used is large. However, sequential execution of these algorithms on large amounts of data can take a very long time. In this paper, we consider the problem of training Recurrent Neural Network (RNN) for hate (aggressive) messages detection task. We first compared the sequential execution of three variants of RNN, we have shown that Long Short Time Memory (LSTM) provides better metric performance, but implies more important execution time in comparison with Gated Recurrent Unit (GRU) and standard RNN. To have both good metric performance and reduced execution time, we proceeded to a parallel implementation of the training algorithms. We proposed a parallel algorithm based on an implicit aggregation strategy in comparison to the existing approach which is based on a strategy with an aggregation function. We have shown that the convergence of this proposed parallel algorithm is close to that of the sequential algorithm. The experimental results on an 32-core machine at 1.5 GHz and 62 Go of RAM show that better results are obtained with the parallelization strategy that we proposed. For example, with an LSTM on a dataset having more than 100k comments, we obtained an f-measure of 0.922 and a speedup of 7 with our approach, compared to a f-measure of 0.874 and a speedup of 5 with an explicit aggregation between workers.

https://doi.org/10.46298/arima.13400

Source : HAL:hal-04542984v3

Volume : Volume 42 - Numéro spécial CRI 2023 - 2024/2025

Publié le : 11 février 2025

Accepté le : 28 novembre 2024

Soumis le : 12 avril 2024

Mots-clés : [INFO]Computer Science [cs], [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [en] Deep Learning, Recurrent Neural Network, hateful messages recognition, parallel programming

Licence : Attribution 4.0 International (CC BY 4.0)

Références bibliographiques

Partager et exporter

Statistiques de consultation

Cette page a été consultée 419 fois.

Le PDF de cet article a été téléchargé 300 fois.