AI moderation misses most African languagesCEFR B2
20 Apr 2026
Adapted from Guest Contributor, Global Voices • CC BY 3.0
Photo by Zulfugar Karimov, Unsplash
AI systems that remove harmful content on social media often lack understanding of the continent’s many languages. Moderators report a mismatch between the languages people speak and the languages these tools can process, which affects what content stays online and what is taken down for millions of users.
A 2025 study found only 42 African languages appear in major language models and only four are handled consistently:
- Amharic
- Swahili
- Afrikaans
- Malagasy
Because training data is dominated by English, moderation systems make both false positives and false negatives: content can be removed without clear explanation, while harmful posts in low-resource languages can remain. Real cases illustrate the risks. Between January and March 2025 TikTok removed more than 450,000 videos from Kenya and banned over 43,000 accounts; by Q2 removals reached 592,000. False claims also spread in Ethiopia before fact-checkers debunked them.
Researchers and civil society are working to close the gap. Groups like AfricaNLP, academic teams in Pretoria, Nairobi and Addis Ababa, and collaborations such as Cohere with HausaNLP are building datasets and improving models. The AU approved a Continental AI Strategy in July 2024 and national strategies followed, including Nigeria’s in April 2025. European rules—the EU AI Act (in force August 2024) and the Digital Services Act (February 2024)—create pressure for greater transparency and non-discrimination, but building representative training data and operational coverage remains a practical challenge.
Difficult words
- moderation — process of checking and removing online content
- false positive — content wrongly identified as harmfulfalse positives
- false negative — harmful content not detected by the systemfalse negatives
- low-resource language — language with limited digital data availablelow-resource languages
- training data — examples used to teach AI models
- transparency — openness about how systems and rules work
- debunk — show that a claim or story is falsedebunked
Tip: hover, focus or tap highlighted words in the article to see quick definitions while you read or listen.
Discussion questions
- How could limited support for many African languages in moderation tools affect daily online life for users in those countries?
- What practical difficulties do you think researchers face when building representative training data for many languages? Give one or two examples.
- What actions could platforms or governments take to reduce wrongful removals while also preventing harmful posts in low-resource languages?
Related articles
New AI tools for tuberculosis shown at lung health conference
Researchers presented four new AI approaches for detecting and monitoring TB at the Union World Conference on Lung Health in Copenhagen (18–21 November). The tools include breath analysis, cough screening, vulnerability mapping and a chest X‑ray tool for children.
Why Rechargeable Batteries Lose Performance
Researchers found that repeated charging and discharging makes batteries expand and contract, causing tiny shape changes and stress. This “chemomechanical degradation” and spreading strain reduce performance and shorten battery life, and imaging revealed how it happens.
New training method helps models do long multiplication
Researchers studied why modern language models fail at long multiplication and compared standard fine-tuning with an Implicit Chain of Thought (ICoT) method. ICoT models learned to store intermediate results and reached perfect accuracy.