Presenting the Global Top 100 outstanding projects: Afrocentric NLP

Published on May 30, 2023

Share this post

After a thorough scientific, ethical, and business review of all projects submitted to the 2022 IRCAI Global Top 100 call, IRCAI has deemed ten submissions as “outstanding” based on their AI integrity, potential impact on the SDGs, business sustainability, and ethical design. In this article, we will focus on one of these ten submissions, “Afrocentric NLP”, a project that has produced a neural language identification toolkit for 517 African languages. In early May, we had the privilege of welcoming team member and researcher Ife Adebara (a PhD researcher at the University of British Columbia) to our UN STI Forum side event.

Underrepresented voices in the digital sphere

The speed at which we access information from the internet has changed in remarkable ways. Whether you are looking for a guide to improving your algebra skills, an article on the history of your hometown, or PDF instructions on how to use that old camera you found in your grandmother’s pantry, the internet offers a wealth of valuable information. However, language barriers make it difficult to access online content. In the digital sphere, certain languages are poorly represented, leaving many languages severely underrepresented. Dominant languages are amplified and end up speaking for others. Web translators can at least help translate content, but many languages remain excluded from their automated capabilities, further exacerbating the issue.

An Afrocentric approach to technological development

Motivated to take an “Afrocentric approach to technology development,” Ife and her team are developing various language identification models. These are machine learning models used to automatically determine the language of a given text or speech samples. AfroLID has been trained on large datasets containing text and language samples from a wide range of African languages, and can thus learn patterns, statistical features, and linguistic properties specific to each of these languages, enabling it to make accurate predictions about the language in new, unseen samples. Ife explains that AfroLID represents an “important first step in human language processing.” Language identification models are an important prerequisite for decomposing texts into smaller units such as individual words, characters, and other linguistic units (called “tokens” in NLP), which can then facilitate the development of multilingual models and machine translation services. “AfroLID is a multidomain web dataset manually curated from 14 language families domiciled in 50 African countries across 5 of the graphic systems,” Ife explains. Therewith, the language identification model covers an astounding 517 languages and language varieties across the continent.

A publicly available toolkit

The LID toolkit is publicly available to “aid the continued development of natural language processing models for African countries,” she notes. The data are of high quality and are manually curated to “ensure that languages are represented correctly.” “This is especially important for Africa,” she adds, “where African people need to continue to be taught in the languages they prefer to speak and learn.”

Afrocentric NLP is a group project by Ife Adebara (PhD researcher at the University of British Columbia), Muhammad Abdul-Mageed (Canada Research Chair in Natural Language Processing at the University of British Columbia), AbdelRahim Elmadany (Postdoctoral researcher at the University of British Columbia) and Alcides Alcoba (Research Assistant at the University of British Columbia). Take a look at AfroLID’s GitHub, working demo and installation requirements. For more background, see the related publication on the neural language identification tool. Ife has also co-authored articles on massively multilingual language models for Africa, linguistic and sociopolitical challenges in developing NLP technologies for African languages, and on using transfer learning based on pre-trained neural machine translation models to translate between similar low-resource languages.


Georgia DIP and AI Research Project

Georgia DIP and AI Research Project

International Telecommunication Union (ITU) has published the digital innovation profile for Georgia, a document analysing digital innovation ecosystem in Georgia, focusing on the opportunities and challenges in the field of artificial intelligence (AI). The document had been prepared with the collaboration of IRCAI experts.

Highlights from the 16th ASEF Classroom Network Conference

Educators and professionals from across Europe and Asia gathered in the picturesque city of Ljubljana, Slovenia, from November 12-15, 2023, for the 16th ASEF Classroom Network Conference. Hosted by the International Research Centre on AI under the Auspices of UNESCO (IRCAI) and supported by the Ministry of Foreign and European Affairs of the Republic of Slovenia, this annual event was a success: it brought teachers together, facilitated the creation of new partnerships, answered some questions about AI in education and uncovered many new and important ones.

In Person Meeting of the African Members of the NAIXUS Network of AI Researchers on AI and Development

In Person Meeting of the African Members of the NAIXUS Network of AI Researchers on AI and Development

The NAIXUS project (Network of AI Researchers on AI and the United Nations SDGs) convened a significant meeting during the Deep Learning Indaba 2023. The purpose of this meeting was to discuss the progress of the project, share insights, and plan future actions to strengthen the network’s impact on advancing the United Nations Sustainable Development Goals (SDGs) through artificial intelligence (AI) research.

Workshop at the Deep Learning Indaba 2023: “Building a Global Network of AI Researchers on AI and the United Nations SDGs” 

Workshop at the Deep Learning Indaba 2023: “Building a Global Network of AI Researchers on AI and the United Nations SDGs” 

The International Research Institute on AI under the auspices of UNESCO (IRCAI) and its core members in the Network of Excellence NAIXUS; a multi-stakeholder initiative aimed at bridging the gap between AI and sustainable development hosted a meeting amd a workshop in Accra, Ghana. Both events were held as part of the Deep Learning Indaba 2023 Forum on September 8 and 9 and were co-hosted by the NAIXUS members from Africa.


International Research Centre
on Artificial Intelligence (IRCAI)
under the auspices of UNESCO 

Jožef Stefan Institute
Jamova cesta 39
SI-1000 Ljubljana


The designations employed and the presentation of material throughout this website do not imply the expression of any opinion whatsoever on the part of UNESCO concerning the legal status of any country, territory, city or area of its authorities, or concerning the delimitation of its frontiers or boundaries.

Design by Ana Fabjan