A team led by Xiaoyan Bai and Chenhao Tan at the University of Chicago, with collaborators from MIT, Harvard, the University of Waterloo and Google DeepMind, studied why state-of-the-art language models fail at long multiplication. They focused on long-range dependencies: the need to hold partial products and running sums to reach a correct final answer.
Under standard fine-tuning, models with two to 12 layers achieved less than 1% accuracy on four-digit multiplication; the researchers concluded these models fell into a local optimum by learning surface patterns rather than storing intermediate values. In contrast, a model trained with Implicit Chain of Thought (ICoT) reached 100% accuracy. Probing the ICoT model showed that its hidden states encoded intermediate values and that running sums could be decoded.
The team also tested a simple training objective that teaches a model to track running sums at each step. Adding that objective to a two-layer model raised accuracy to 99% and produced attention patterns similar to ICoT. The study argues that architectural guidance and targeted objectives can enable multi-step reasoning.
Difficult words
- long-range dependency — need to keep information across many stepslong-range dependencies
- partial product — a number from one multiplication steppartial products
- running sum — a total that updates after each steprunning sums
- fine-tuning — training a model on new task data
- local optimum — a solution that is not best overall
- implicit chain of thought — training method that encourages stepwise reasoningImplicit Chain of Thought (ICoT)
Tip: hover, focus or tap highlighted words in the article to see quick definitions while you read or listen.
Discussion questions
- Why is it helpful for a model to store intermediate values when doing long multiplication?
- Do you think the same training objective (tracking running sums) could help models in other multi-step tasks? Why or why not?
- Which is more important for multi-step reasoning: model architecture or specific training objectives? Explain with simple reasons.
Related articles
Biodegradable patch may help heart heal after heart attack
Researchers report a biodegradable microneedle patch that delivers interleukin-4 to injured heart tissue. The local treatment encourages healing immune cells, reduces scarring, and may improve heart recovery while avoiding systemic side effects.
Sportellino: a multilingual chatbot for migrants in Italy
A multilingual chatbot called Sportellino launched to help migrants in Italy find public services and practical guidance. It is free, anonymous and available via messaging apps; its information is based on official sources and experts.
High-dose antioxidants may harm sperm and offspring
A study in mice found that regular high doses of antioxidant supplements can damage sperm and change skull and facial development in offspring. Researchers advise caution for men taking large antioxidant doses when planning children.
Egyptian university and pharma join to create Africa’s first biotechnology academy
The American University in Cairo and Minapharm have formed a partnership to set up what the university calls the first African academy for biotechnology. The initiative starts early this year to strengthen education, research and industry links.