Researchers from Brown University presented work at the International Conference on Learning Representations in Rio de Janeiro, Brazil. Michael Lepori, a PhD candidate who led the study, says the models show "some evidence" of encoding the real world's causal limits and that this links to human judgements.
The team tested sentences that described plausible, improbable, impossible or nonsensical events, for example using ice, snow, fire or the word "yesterday." They used mechanistic interpretability, a method that studies a model's internal states like a kind of "neuroscience for AI systems." The experiments ran on several open-source models and found internal patterns that reflect human responses.
Difficult words
- encode — to store or represent information inside somethingencoding
- causal — about cause and effect between events
- judgement — an opinion or decision about somethingjudgements
- plausible — likely or possible to be true
- improbable — not likely to happen or be true
- nonsensical — without sense or impossible to understand
- mechanistic interpretability — a method to study a model's internal workings
- open-source — software or models with freely available code
Tip: hover, focus or tap highlighted words in the article to see quick definitions while you read or listen.
Discussion questions
- Do you think computer models can learn cause and effect? Why or why not?
- Which example from the article (ice, snow, fire, or "yesterday") is easiest for you to imagine? Explain in one sentence.
- Have you ever seen a sentence that is impossible or nonsensical? Give a short example.