LingVo.club
Level
New training method helps models do long multiplication — Level B1 — brown wooden blocks on white surface

New training method helps models do long multiplicationCEFR B1

29 Dec 2025

Level B1 – Intermediate
4 min
182 words

A team led by Xiaoyan Bai and Chenhao Tan at the University of Chicago, with collaborators from MIT, Harvard, the University of Waterloo and Google DeepMind, studied why state-of-the-art language models fail at long multiplication. They focused on long-range dependencies: the need to hold partial products and running sums to reach a correct final answer.

Under standard fine-tuning, models with two to 12 layers achieved less than 1% accuracy on four-digit multiplication; the researchers concluded these models fell into a local optimum by learning surface patterns rather than storing intermediate values. In contrast, a model trained with Implicit Chain of Thought (ICoT) reached 100% accuracy. Probing the ICoT model showed that its hidden states encoded intermediate values and that running sums could be decoded.

The team also tested a simple training objective that teaches a model to track running sums at each step. Adding that objective to a two-layer model raised accuracy to 99% and produced attention patterns similar to ICoT. The study argues that architectural guidance and targeted objectives can enable multi-step reasoning.

Difficult words

  • long-range dependencyneed to keep information across many steps
    long-range dependencies
  • partial producta number from one multiplication step
    partial products
  • running suma total that updates after each step
    running sums
  • fine-tuningtraining a model on new task data
  • local optimuma solution that is not best overall
  • implicit chain of thoughttraining method that encourages stepwise reasoning
    Implicit Chain of Thought (ICoT)

Tip: hover, focus or tap highlighted words in the article to see quick definitions while you read or listen.

Discussion questions

  • Why is it helpful for a model to store intermediate values when doing long multiplication?
  • Do you think the same training objective (tracking running sums) could help models in other multi-step tasks? Why or why not?
  • Which is more important for multi-step reasoning: model architecture or specific training objectives? Explain with simple reasons.

Related articles

New training method helps models do long multiplication — English Level B1 | LingVo.club