LingVo.club
Level
New training method helps models do long multiplication — Level B2 — brown wooden blocks on white surface

New training method helps models do long multiplicationCEFR B2

29 Dec 2025

Level B2 – Upper-intermediate
6 min
350 words

New research explains why modern large language models struggle with a seemingly simple task: multiplying multi-digit numbers. The study examines how current training methods affect a model’s ability to store and reuse intermediate results, a capability required for long calculations and long-range dependencies such as holding partial products and running sums through many steps.

Researchers led by Xiaoyan Bai and Chenhao Tan at the University of Chicago, with collaborators from MIT, Harvard, the University of Waterloo and Google DeepMind, compared standard fine-tuning with an alternative training method called Implicit Chain of Thought (ICoT). Under standard fine-tuning, models with two to 12 layers achieved less than 1% accuracy on four-digit multiplication because they fell into a local optimum: they learned superficial patterns in the data but did not develop a mechanism to store intermediate values for later steps. By contrast, the ICoT-trained model reached 100% accuracy.

Probes of the models’ internal states showed that the ICoT model encodes intermediate values: the researchers could decode running sums from its hidden states. ICoT also organizes attention along distinct temporal pathways. Early layers compute and store digit-pair products at specific locations, while later layers retrieve those values to form each digit of the final answer. The team observed digit representations that use Fourier-like bases and a geometric operation similar to a Minkowski sum that emerged during training.

The authors then added a training objective that explicitly teaches a model to track running sums at each step. Applied to a two-layer model, this objective raised accuracy to 99% without explicit chain-of-thought supervision; the model developed attention mechanisms and new strategies for tracking multiple digit pairs similar to ICoT. The study highlights that scaling alone does not fix some limits and that architectural guidance and targeted objectives can enable multi-step reasoning. "As AI is increasingly integrated into critical decision-making, it’s essential to understand its unique ways of learning and thinking," says Tan.

  • Key mechanisms: encoded intermediate values
  • Distinct attention pathways across time
  • Fourier-like digit representation observed
  • Targeted objectives can greatly improve performance

Difficult words

  • intermediatevalues or steps between start and end
  • fine-tuningadjusting a trained model with new data
  • local optimuma solution that is not globally best
  • encodestore information in an internal form
    encodes
  • attentionmechanism to weigh or focus information
  • temporalrelated to time or sequence order
  • objectivea specific goal used during training
    objectives
  • scaleincrease model size or computing resources
    scaling
  • mechanisma structure or process that produces behavior
    mechanisms

Tip: hover, focus or tap highlighted words in the article to see quick definitions while you read or listen.

Discussion questions

  • How might targeted training objectives like the running-sum objective change the reliability of AI in real-world decisions?
  • Can the attention and intermediate-value strategies described for multiplication be useful for other multi-step tasks? Why or why not?
  • The article says scaling alone does not fix some limits. What trade-offs should researchers consider between scaling models and adding architectural guidance?

Related articles

Can Lost Vision Be Restored? — Level B2
31 Dec 2025

Can Lost Vision Be Restored?

A new video with Juliette McGregor of the University of Rochester Medical Center explains that blindness is a spectrum. It looks at treatments, assistive support and ongoing research into retinal damage and future therapies.