Artificial intelligence tools are spreading fast, yet many people who do not speak English are not benefiting. A 2025 paper from the Stanford Institute for Human-Centered Artificial Intelligence found that many popular LLMs perform poorly in languages other than English, in part because model developers often rely on English-language data.
Researchers noted that public LLMs, including some developed in part by Google and Meta, can generate responses that do not fit the needs of the global majority. The concentration of AI firms and data in wealthier areas such as Silicon Valley has widened the divide. News outlets reported that millions who speak languages like Kurdish and Swahili are effectively deprioritized, and users who ask for help in other languages can receive unhelpful or error-filled outputs.
Practical problems have emerged when AI is applied to everyday tasks. Wired described how asking an LLM such as ChatGPT to write an email in Tamil may produce a muddled draft in English. Attempts to increase multilingual data have sometimes backfired: the MIT Technology Review found that many low-resource language texts scraped from the web contain machine-translation mistakes. Well-meaning contributors often lack the skills to check accuracy, and their content becomes training data that reinforces errors.
The cultural effects are also significant. The Atlantic and other outlets warned that AI outputs tend to reflect the norms and values of English speakers in well-resourced countries, making non-English perspectives invisible. Observers say the tech sector’s “move fast, break things” approach continues in the age of AI. Experts and commentators suggest concrete steps: companies should work with sidelined communities and grassroots AI leaders, include local input, review outputs for accuracy and authenticity, and form collaborative partnerships that respect cultural differences.
- Work with local communities
- Validate multilingual data
- Partner with grassroots developers
Difficult words
- low-resource — languages with little available digital data
- deprioritize — treated as less important or pushed asidedeprioritized
- widen — make a gap or difference largerwidened
- reinforce — make a belief or problem stronger or repeatedreinforces
- authenticity — quality of being real and culturally accurate
- grassroots — ordinary local people or community organisations
- validate — check that data is correct and acceptable
Tip: hover, focus or tap highlighted words in the article to see quick definitions while you read or listen.
Discussion questions
- How could technology companies work with local communities to improve AI outputs in other languages? Give examples of practical steps.
- What risks arise if AI continues to reflect mainly English-speaking norms and values? How might that affect culture and information access?
- Do you think validating multilingual data is realistic at scale? What challenges and possible solutions can you imagine?