As artificial intelligence (AI) reshapes how people access information, communicate and learn, a group of South African researchers is working to ensure that African languages are not left behind.
Researchers from the University of Cape Town (UCT) have joined colleagues from three other universities in a national collaboration to develop AI tools that better understand and serve African languages such as isiXhosa, isiZulu and Sepedi.
The project forms part of a new initiative supported by the National Research Foundation (NRF) and the Telkom Centres of Excellence programme, which has funded information and communications technology research in South Africa for more than two decades.
Unlike previous initiatives, the new project intentionally brings together researchers from multiple institutions. Led by Professor Matthew Adigun and Professor Alfredo Terzoli of the University of Zululand, Associate Professor Thipe Modipa of the University of Limpopo, Dr Phumzile Nomnga of the University of Fort Hare and Associate Professor Melissa Densmore from UCT, the collaboration seeks to pool expertise from across the country. It will fund master’s, PhD, and postdoctoral researchers across the institutions.
“The idea is to build collaboration between universities while developing new innovations and technologies in the ICT sector.”
Recently, UCT hosted the latest consortium meeting on campus to share developments in the project and chart the way forward.
“It’s one of the first projects where the Centres of Excellence are working across institutions,” explained UCT researcher Associate Professor Densmore. “The idea is to build collaboration between universities while developing new innovations and technologies in the ICT sector.”
At the heart of the project is the development of large language models (LLMs) – the type of AI systems that power tools like chatbots and digital assistants. But building such systems for African languages poses unique challenges.
Tackling the data gap
Most existing AI language models are trained on vast amounts of digital text collected from the internet – including social media, news websites and online forums. For many African languages, however, such digital data is limited.
“The amount of text available in languages like isiZulu or isiXhosa is much smaller than what exists for English or other widely used languages,” said Dr Jan Buys, a UCT researcher involved in the project. “So, one of the research challenges is how to develop models that still work effectively, even when the data available is limited.”
To address this gap, researchers are searching for underutilised sources of language data, including printed materials in libraries and archives that have never been digitised.
But the team’s work goes beyond simply gathering text. They are also exploring new techniques to train language models more efficiently when data is scarce.
Another challenge lies in the linguistic structure of African languages themselves.
“These languages are morphologically complex,” Dr Buys explained. “The structure of the words can be quite intricate, so we need algorithms that can handle that complexity. If we can model those structures correctly, it can make the learning process more efficient.”
While the technical work is crucial, the researchers stress that building AI systems for African languages also raises important ethical and societal questions.
As part of the project, the team plans to consult language experts, AI specialists and native speakers to better understand the broader implications of the technology.
“We want to talk to people who speak these languages about the potential impact of AI tools and what the trajectory of this kind of research should look like,” Densmore said.
The aim, she added, is to ensure that the development of AI technologies reflects local needs and values rather than simply replicating systems designed elsewhere.
“This is about shaping global AI knowledge rather than just importing and using technologies that have been created in other parts of the world.”
The project also aligns with broader efforts at UCT to expand research and collaboration around AI.
Researchers across the university are working together under an emerging AI initiative aimed at bringing together experts from different faculties to explore both the fundamental science and practical applications of AI.
The long-term goal is to establish a dedicated AI institute that advances research while addressing societal challenges.
Why language matters
Improving AI tools for African languages could have far-reaching implications in areas such as healthcare, education and public services.
Currently, many widely used AI systems struggle to respond accurately when users ask questions in less-resourced languages.
“When people search for information in their own language, the responses are often worse,” Densmore said. “They might be poorly translated, poorly framed, or simply incorrect because the system doesn’t have enough relevant content in that language.”
In fields such as healthcare, this can have serious consequences.
“If someone is looking for health information and the system gives inaccurate or misleading answers – that becomes a real problem from a misinformation standpoint.”
“If someone is looking for health information and the system gives inaccurate or misleading answers – that becomes a real problem from a misinformation standpoint,” she added.
Developing stronger language models could help ensure that people receive reliable information in the languages they speak most comfortably.
The benefits extend beyond information access. Language technologies can also strengthen communication between professionals and communities.
Densmore pointed to earlier projects that supported bilingual communication between healthcare workers and parents in neonatal intensive care units. In those initiatives, digital tools allowed users to listen to information in English while reading it in isiXhosa, helping bridge communication gaps.
“Even simple tools that support multiple languages can help people learn vocabulary and communicate more effectively,” she said.
Building tools with communities
Another key goal of the project is to involve communities in shaping how AI tools are developed.
Through previous research and community engagement projects, Densmore has seen firsthand how people want technologies that reflect the languages and dialects they use daily.
“In one community we worked with, people said they would love to have a chatbot that speaks their local dialect – the language they use at home,” she said. “It would feel more like something that belongs to them.”
Ultimately, the researchers hope their work will help empower communities to build their own digital tools.
“My long-term vision is that people can build technologies themselves in their own languages,” Densmore said. “Whether those are powered by language models or other kinds of AI, the key is that communities have ownership over them.”
Running until 2027, the project will support postgraduate research, including PhD and master’s students at UCT. But its biggest impact may lie in strengthening collaboration between institutions working on similar challenges.
For Densmore, who has worked with many international partners, the project has also opened new opportunities to connect with researchers across South Africa.
“This is a really great opportunity to understand and collaborate more closely with other universities in the country,” she said.
Rather than promising immediate breakthroughs, the team sees the project as the beginning of a longer journey.
“I hope this is the start of an ongoing collaboration,” Densmore said. “If we can start asking better and more relevant questions about African language technologies and their role in communities, that will already be a significant step forward.”
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
Please view the republishing articles page for more information.