Reducing unsafe responses in large language models (English, Level A2)

Large language models (LLMs) can give advice or instructions, so it is important that their responses are safe. A research team at a university studied how safety training works and tested new training ideas to reduce unsafe outputs while keeping good performance.

The researchers found two main problems. First, safety training can lower a model's accuracy, a problem called the alignment tax. Second, many models use a simple safety check that can be bypassed. The team proposed a hypothesis about this simple check and tested a method that freezes some model parts during fine-tuning to keep safety while the model learns new tasks. The work will be shown at an international conference.

Difficult words

model — a computer program that generates or predicts text

models, model's

safety — being protected from dangerous or harmful outputs

safety training, safety check

alignment tax — loss of model accuracy after safety training

fine-tuning — additional training to adapt a model to new tasks

freeze — stop changing some parts during model training

freezes

accuracy — how correct or precise a model's outputs are

Tip: hover, focus or tap highlighted words in the article to see quick definitions while you read or listen.

10 Dec 2025

Daily shift in mouse brain activity

Researchers combined genetic tagging, 3D imaging and computational analysis to follow single cells in mouse brains across the day. They found activity shifts from deep brain layers toward the cortex and aim to identify fatigue signatures.

Level

Read

17 Apr 2026

Indonesia tightens rules for digital platforms

Indonesia is increasing regulation of global digital platforms to curb misinformation and protect public safety. Officials inspected a major company's office, require platform registration, and use takedown systems, which has drawn criticism over unclear rules and rights.

Level

Read

23 Jun 2026

Many hikers lack emergency gear in US parks

Warmer weather has increased visits to remote trails and driven a rise in wilderness emergencies. A Boston University study of park visitors found many hikers and trail runners did not carry required emergency gear.

Level

Read

16 Feb 2026

How Data Is Changing the 2026 Winter Olympics

At the 2026 Winter Olympics, teams and researchers use data science, computer vision and wearable tech to study performance. These tools shape training, help broadcasters explain results, and raise questions about fairness and access.

Level

Read

8 Dec 2025

ETH Zurich makes nano-scale OLED pixels

Researchers at ETH Zurich created organic light-emitting diodes (OLEDs) at the nanoscale, with pixels down to about 100 nanometres. The team showed a logo of 2,800 nano-OLEDs and reports large gains in pixel density and new applications.

Level

Read

Reducing unsafe responses in large language models^{CEFR A2}

Difficult words

Discussion questions

Related articles

Daily shift in mouse brain activity

Indonesia tightens rules for digital platforms

Many hikers lack emergency gear in US parks

How Data Is Changing the 2026 Winter Olympics

ETH Zurich makes nano-scale OLED pixels

Reducing unsafe responses in large language models CEFR A2

Difficult words

Discussion questions

Related articles

Daily shift in mouse brain activity

Indonesia tightens rules for digital platforms

Many hikers lack emergency gear in US parks

How Data Is Changing the 2026 Winter Olympics

ETH Zurich makes nano-scale OLED pixels

Reducing unsafe responses in large language models^{CEFR A2}