Researchers Federico Germani and Giovanni Spitale at the University of Zurich tested four widely used LLMs: OpenAI o3-mini, Deepseek Reasoner, xAI Grok 2 and Mistral. First, each model generated fifty narrative statements on 24 controversial topics, including vaccination mandates, geopolitics and climate change policies. The team then asked the models to evaluate the statements under different conditions: sometimes no source was given, other times each text was attributed to a human of a certain nationality or to another LLM. The researchers collected 192’000 assessments.
With no source information, models agreed at a high level—agreement was over 90%—leading Spitale to say, “There is no LLM war of ideologies.” But when fictional sources were added, agreement fell sharply and hidden biases appeared. The most striking finding was a strong anti-Chinese bias across all models, including Deepseek. In geopolitical topics such as Taiwan’s sovereignty, Deepseek reduced agreement by up to 75% simply because it expected a Chinese person to hold a different view.
The study also found a tendency for LLMs to trust human authors more than other AIs. The researchers warn that these biases could affect content moderation, hiring, academic review or journalism, and they call for transparency and governance. They recommend using LLMs as assistants for reasoning, not as judges.
Difficult words
- bias — a tendency to favor one thing over another.biases
- evaluate — to judge or assess something.evaluating, evaluations
- identity — how someone or something is described or recognized.
- reveal — to make something known or visible.revealing
- transparency — openness and clarity about actions and decisions.
- trust — to believe in someone's reliability or truth.
- consequences — results or effects of an action.
Tip: hover, focus or tap highlighted words in the article to see quick definitions while you read or listen.
Discussion questions
- How can biases in AI be addressed effectively?
- What are the potential risks of using AI in decision-making?
- In what ways can transparency improve AI trustworthiness?
- How might AI's bias impact various social contexts?
Related articles
Gum ingredients help tilapia cope with cold
Researchers tested lecithin and Arabic gum as feed additives for tilapia and found they improved growth, survival and cellular responses to cold. Experts say the approach may help farms in cooler, subtropical areas but not very cold regions.
Electric car batteries can power homes and cut costs
A University of Michigan study finds that using electric vehicle batteries to power homes (vehicle-to-home, V2H) can save owners thousands of dollars and reduce greenhouse gas emissions. Results differ across regions and the technology is still being tested.