A team led by Xiaoyan Bai and Chenhao Tan at the University of Chicago, with collaborators from MIT, Harvard, the University of Waterloo and Google DeepMind, studied why state-of-the-art language models fail at long multiplication. They focused on long-range dependencies: the need to hold partial products and running sums to reach a correct final answer.
Under standard fine-tuning, models with two to 12 layers achieved less than 1% accuracy on four-digit multiplication; the researchers concluded these models fell into a local optimum by learning surface patterns rather than storing intermediate values. In contrast, a model trained with Implicit Chain of Thought (ICoT) reached 100% accuracy. Probing the ICoT model showed that its hidden states encoded intermediate values and that running sums could be decoded.
The team also tested a simple training objective that teaches a model to track running sums at each step. Adding that objective to a two-layer model raised accuracy to 99% and produced attention patterns similar to ICoT. The study argues that architectural guidance and targeted objectives can enable multi-step reasoning.
Difficult words
- long-range dependency — need to keep information across many stepslong-range dependencies
- partial product — a number from one multiplication steppartial products
- running sum — a total that updates after each steprunning sums
- fine-tuning — training a model on new task data
- local optimum — a solution that is not best overall
- implicit chain of thought — training method that encourages stepwise reasoningImplicit Chain of Thought (ICoT)
Tip: hover, focus or tap highlighted words in the article to see quick definitions while you read or listen.
Discussion questions
- Why is it helpful for a model to store intermediate values when doing long multiplication?
- Do you think the same training objective (tracking running sums) could help models in other multi-step tasks? Why or why not?
- Which is more important for multi-step reasoning: model architecture or specific training objectives? Explain with simple reasons.
Related articles
Yale reveals molecular structure of cholera flagella
Yale researchers imaged the molecular structure of Vibrio cholerae flagella in living bacteria. The study shows how flagella proteins sit inside a hydrophilic sheath and suggests the sheath helps the bacterium move and infect cells.