AI moderation misses most African languagesCEFR B2
20 Apr 2026
Adapted from Guest Contributor, Global Voices • CC BY 3.0
Photo by Zulfugar Karimov, Unsplash
AI systems that remove harmful content on social media often lack understanding of the continent’s many languages. Moderators report a mismatch between the languages people speak and the languages these tools can process, which affects what content stays online and what is taken down for millions of users.
A 2025 study found only 42 African languages appear in major language models and only four are handled consistently:
- Amharic
- Swahili
- Afrikaans
- Malagasy
Because training data is dominated by English, moderation systems make both false positives and false negatives: content can be removed without clear explanation, while harmful posts in low-resource languages can remain. Real cases illustrate the risks. Between January and March 2025 TikTok removed more than 450,000 videos from Kenya and banned over 43,000 accounts; by Q2 removals reached 592,000. False claims also spread in Ethiopia before fact-checkers debunked them.
Researchers and civil society are working to close the gap. Groups like AfricaNLP, academic teams in Pretoria, Nairobi and Addis Ababa, and collaborations such as Cohere with HausaNLP are building datasets and improving models. The AU approved a Continental AI Strategy in July 2024 and national strategies followed, including Nigeria’s in April 2025. European rules—the EU AI Act (in force August 2024) and the Digital Services Act (February 2024)—create pressure for greater transparency and non-discrimination, but building representative training data and operational coverage remains a practical challenge.
Difficult words
- moderation — process of checking and removing online content
- false positive — content wrongly identified as harmfulfalse positives
- false negative — harmful content not detected by the systemfalse negatives
- low-resource language — language with limited digital data availablelow-resource languages
- training data — examples used to teach AI models
- transparency — openness about how systems and rules work
- debunk — show that a claim or story is falsedebunked
Tip: hover, focus or tap highlighted words in the article to see quick definitions while you read or listen.
Discussion questions
- How could limited support for many African languages in moderation tools affect daily online life for users in those countries?
- What practical difficulties do you think researchers face when building representative training data for many languages? Give one or two examples.
- What actions could platforms or governments take to reduce wrongful removals while also preventing harmful posts in low-resource languages?
Related articles
2025 aid cuts threaten health and humanitarian services
Large reductions in international aid in 2025 disrupted health and humanitarian services in many low- and middle-income countries. The cuts began with a US suspension of aid and led to the closure of USAID and wider global impacts.
Rethinking 'the Human' in AI
Artist and writer Xonorika Kira argues that centring the human in AI can exclude other forms of knowledge and intelligence. She proposes practical changes like small, consent-based datasets and slower models to support communities and cultural sovereignty.