A 2025 paper from the Stanford Institute for Human-Centered Artificial Intelligence found that many popular LLMs perform poorly in languages other than English. Researchers warned that public models, including some developed in part by Google and Meta, can produce responses that do not meet the needs of the global majority.
The concentration of AI firms and data in wealthier areas such as Silicon Valley has widened the divide. News outlets reported that millions who speak languages like Kurdish and Swahili are effectively deprioritized, and users who ask for help in other languages often receive unhelpful or error-filled outputs.
Practical problems have appeared in everyday tasks. Wired explained that asking an LLM such as ChatGPT to write an email in Tamil may yield a muddled draft in English. The MIT Technology Review found that many low-resource language texts scraped from the web contain machine-translation mistakes, and well-meaning contributors often lack the skills to check accuracy. Faulty content can then become training data and reinforce errors.
Observers also note cultural effects: AI outputs tend to reflect the norms and values of English speakers in well-resourced countries, which can make non-English perspectives invisible. Experts recommend working with sidelined communities, including local input, reviewing outputs for accuracy and authenticity, and forming partnerships that respect cultural differences.
Difficult words
- concentration — the gathering of people or data together
- deprioritize — to treat something as less importantdeprioritized
- muddle — to make something unclear or confusedmuddled
- scrape — to collect information from websites automaticallyscraped
- accuracy — how correct or exact information is
- reinforce — to make an idea or problem stronger
- norm — usual rules or expected behavior in societynorms
- authenticity — the quality of being real and true
Tip: hover, focus or tap highlighted words in the article to see quick definitions while you read or listen.
Discussion questions
- Which languages or communities near you might be deprioritized by current AI models, and why?
- How could companies and researchers include local input and respect cultural differences when they build AI systems?
- How can faulty web texts become part of training data and then reinforce errors in future AI outputs?
Related articles
People with AMD Judge Car Arrival Times Like Others
A virtual reality study compared adults with age-related macular degeneration (AMD) and adults with normal vision. Both groups judged vehicle arrival times similarly; vision and sound were used together, and a multimodal benefit did not appear.
Cyber risks to US infrastructure as strikes on Iran continue
As US strikes on Iran continue, experts warn of possible retaliatory cyberattacks on essential US infrastructure. Alex K. Jones reviewed the risks, naming water systems, power grids and the possible future effect of quantum computing.
New cocoa fermenting box boosts farmers' incomes
In Kasawo, a locally made single cocoa fermenting box improves bean fermentation and helps farmers sell directly to exporters. Researchers report faster, better fermentation, higher prices and plans to scale up production across cocoa districts.