UCT Researchers Build AI Model That Covers All 11 South African Languages
Researchers at the University of Cape Town have built an AI language model trained on all 11 of South Africa’s official written languages. The team, led by Anri Lombard and Dr Jan Buys from UCT’s Department of Computer Science, developed MzansiLM to fix a long-standing gap that leaves millions of South Africans underserved by mainstream AI tools.

Source: UGC
The research will be presented at an international language conference in Mallorca, Spain, this month.
The model comes alongside a dataset called MzansiText, which the team built from scratch to support all 11 official written languages. Nine of South Africa’s languages fall into what researchers call a “low-resource” category. That means very little text data exists to train AI systems on those languages.
Why existing AI tools fall short
Languages like isiZulu and isiXhosa have received some global research attention before. But languages such as isiNdebele and Sepedi have been mostly left out of AI development entirely. MzansiLM is believed to be the first publicly available AI language model of its kind to target all 11 languages at once.
DON'T MISS IT: Stay Away From Fake News With Our Short, Free Fact-Checking Course. Join And Get Certified!
Dr Buys explained that low-resource languages struggle because there is simply far less training data available in them. Despite that challenge, MzansiLM still outperformed much larger open-source models on several South African language benchmarks during testing.
The model is not a chatbot like ChatGPT. It is a foundation that developers can build on for specific tasks, such as data summarising or text annotation, in South African languages. The team has made both MzansiText and MzansiLM freely available to the public.
Source: Briefly News
