![]() |
![]() |
Under-resourced languages encounter substantial obstacles in speech recognition owing to the scarcity of resources and limited data availability, which impedes their development and widespread adoption. This paper presents a representation learning model that leverages existing frameworks based on self-supervised learning techniques—specifically, Contrastive Predictive Coding (CPC), wav2vec, and a bidirectional variant of CPC—by integrating them with multilingual learning approaches. We apply this model to three African languages: Wolof, Swahili, and Fongbe. Our evaluation of the resulting representations in a downstream task, automatic speech recognition, utilizing an architecture analogous to DeepSpeech, reveals the model’s capacity to discern language specific linguistic features. The results demonstrate promising performance, achieving Word Error Rates (WER) of 61% for Fongbe, 72% for Wolof, and 88% for Swahili. These findings underscore the potential of our approach in advancing speech recognition capabilities for under-resourced languages, particularly within the African linguistic landscape.