A 2025 paper from the Stanford Institute for Human-Centered Artificial Intelligence found that many popular LLMs perform poorly in languages other than English. Researchers warned that public models, including some developed in part by Google and Meta, can produce responses that do not meet the needs of the global majority.
The concentration of AI firms and data in wealthier areas such as Silicon Valley has widened the divide. News outlets reported that millions who speak languages like Kurdish and Swahili are effectively deprioritized, and users who ask for help in other languages often receive unhelpful or error-filled outputs.
Practical problems have appeared in everyday tasks. Wired explained that asking an LLM such as ChatGPT to write an email in Tamil may yield a muddled draft in English. The MIT Technology Review found that many low-resource language texts scraped from the web contain machine-translation mistakes, and well-meaning contributors often lack the skills to check accuracy. Faulty content can then become training data and reinforce errors.
Observers also note cultural effects: AI outputs tend to reflect the norms and values of English speakers in well-resourced countries, which can make non-English perspectives invisible. Experts recommend working with sidelined communities, including local input, reviewing outputs for accuracy and authenticity, and forming partnerships that respect cultural differences.
Difficult words
- concentration — the gathering of people or data together
- deprioritize — to treat something as less importantdeprioritized
- muddle — to make something unclear or confusedmuddled
- scrape — to collect information from websites automaticallyscraped
- accuracy — how correct or exact information is
- reinforce — to make an idea or problem stronger
- norm — usual rules or expected behavior in societynorms
- authenticity — the quality of being real and true
Tip: hover, focus or tap highlighted words in the article to see quick definitions while you read or listen.
Discussion questions
- Which languages or communities near you might be deprioritized by current AI models, and why?
- How could companies and researchers include local input and respect cultural differences when they build AI systems?
- How can faulty web texts become part of training data and then reinforce errors in future AI outputs?
Related articles
Sudan turns to AI as health system struggles
Sudan’s health system is under severe strain after an almost two‑year civil war. A senior health official says the country is using artificial intelligence to help provide care where normal services no longer reach, while shortages and attacks worsen the crisis.
Indonesia tightens rules for digital platforms
Indonesia is increasing regulation of global digital platforms to curb misinformation and protect public safety. Officials inspected a major company's office, require platform registration, and use takedown systems, which has drawn criticism over unclear rules and rights.