Federico Germani and Giovanni Spitale tested four LLMs: OpenAI o3-mini, Deepseek Reasoner, xAI Grok 2 and Mistral. Each model generated fifty narrative statements on 24 controversial topics such as vaccination mandates, geopolitics and climate change policies. The team then asked the models to evaluate the same texts under different source conditions.
When no author information was given, agreement across models was over 90%. But adding fictional authors changed the results: agreement fell sharply and a clear anti-Chinese bias appeared. The researchers collected 192’000 assessments and warn that these hidden biases matter for real applications. They recommend transparency, governance and using LLMs to assist reasoning, not to replace it.
Difficult words
- researcher — A person who studies or investigates something.Researchers
- evaluate — To judge or calculate the value or quality.
- bias — An unfair preference or dislike.biases
- nationality — The status of belonging to a specific nation.
- concerns — Worries or issues that need attention.
- judgments — Decisions about someone or something.
- moderation — The process of managing or controlling content.
Tip: hover, focus or tap highlighted words in the article to see quick definitions while you read or listen.
Discussion questions
- Why is it important to consider an author's background?
- How can biases in AI affect hiring decisions?
- In what other areas might AI evaluation cause problems?
Related articles
Gut has a backup system for IgA antibodies
Researchers found two different routes that make IgA antibodies in the gut. Early IgA often comes from non‑germinal center cells but later from germinal centers; both types showed similar specificity and mutations, which may help vaccine design.