Researchers from Brown University presented a study at the International Conference on Learning Representations in Rio de Janeiro, Brazil that probed whether language models encode knowledge about the real world. Michael Lepori, the PhD candidate who led the work, reports "some evidence that language models have encoded something like the causal constraints of the real world," and that the models' internal representations predict human plausibility judgements.
The team designed an experiment with sentences of varying plausibility — commonplace items like "Someone cooled a drink with ice," improbable examples such as "Someone cooled a drink with snow," impossible cases like "Someone cooled a drink with fire," and nonsensical lines like "Someone cooled a drink with yesterday." Using mechanistic interpretability, described by the authors as "neuroscience for AI systems," the researchers examined the models' internal mathematical states to see what the systems encode.
The study tested several open-source models, including Open AI's GPT 2, Meta's Llama 3.2 and Google's Gemma 2. They found that sufficiently large models developed distinct internal vectors that corresponded to plausibility categories and could even separate similar categories, such as improbable versus impossible, with roughly 85% accuracy. These vectors also reflected human uncertainty on ambiguous statements and began to appear in models with more than 2 billion parameters, a size small compared with today's trillion-plus-parameter systems.
- Mechanistic interpretability can reveal what models encode.
- Vectors map to human plausibility judgments.
- Findings may aid development of smarter, more trustworthy models.
Difficult words
- encode — store information in a different formencoded
- plausibility — how likely or believable something seems
- mechanistic interpretability — study of internal mechanisms in machine learning models
- representation — internal model state that stores informationrepresentations
- vector — a mathematical list of numbers used by modelsvectors
- parameter — a numeric value that controls model behaviorparameters
- causal constraint — a rule linking cause and effect in real worldcausal constraints
Tip: hover, focus or tap highlighted words in the article to see quick definitions while you read or listen.
Discussion questions
- How could knowledge of internal vectors help developers make language models more trustworthy?
- What risks or limitations might remain even if models encode causal constraints?
- Given that these vectors appeared in models over two billion parameters, how should teams balance model size and explainability when choosing a model?
Related articles
Study: Low numbers of women in science academies
An international analysis finds women made up about 19% of academy members in 2025, with little change in recent years. The report highlights leadership gaps, higher reports of harassment, and calls for stronger gender equality measures.
Winter break activities that build children’s skills
A Virginia Tech educator says families can use everyday tasks during winter break to build thinking, planning and independence. Simple, hands-on activities like baking, budgeting and observing nature teach practical STEM and life skills without formal homework.
Leather waste turned into coffee fertiliser in Uganda
Researchers in Uganda have turned leather production waste into an organic fertiliser for coffee. Trials showed strong results, and the team plans a market-ready product by November to sell in several East and Central African countries.