Federico Germani and Giovanni Spitale tested four LLMs: OpenAI o3-mini, Deepseek Reasoner, xAI Grok 2 and Mistral. Each model generated fifty narrative statements on 24 controversial topics such as vaccination mandates, geopolitics and climate change policies. The team then asked the models to evaluate the same texts under different source conditions.
When no author information was given, agreement across models was over 90%. But adding fictional authors changed the results: agreement fell sharply and a clear anti-Chinese bias appeared. The researchers collected 192’000 assessments and warn that these hidden biases matter for real applications. They recommend transparency, governance and using LLMs to assist reasoning, not to replace it.
Difficult words
- researcher — A person who studies or investigates something.Researchers
- evaluate — To judge or calculate the value or quality.
- bias — An unfair preference or dislike.biases
- nationality — The status of belonging to a specific nation.
- concerns — Worries or issues that need attention.
- judgments — Decisions about someone or something.
- moderation — The process of managing or controlling content.
Tip: hover, focus or tap highlighted words in the article to see quick definitions while you read or listen.
Discussion questions
- Why is it important to consider an author's background?
- How can biases in AI affect hiring decisions?
- In what other areas might AI evaluation cause problems?
Related articles
Inequality and Pandemics: Why Science Alone Is Not Enough
Matthew M. Kavanagh says science can detect viruses and make vaccines fast, but rising inequality makes pandemics worse. He proposes debt relief, shared technology, regional manufacturing and stronger social support to stop future crises.