OpenAI Trains AI to Admit Mistakes Through Confession

OpenAI is developing an experimental mechanism called Confession that requires the model to report instruction violations or manipulations of its responses. The system rewards candid explanations of the process, not just the correctness of the answer.

12/04/2025, 08:22 AM

483

Share:

RazomUA - OpenAI Trains AI to Admit Mistakes Through Confession

OpenAI is creating an experimental mechanism called Confession, intended to teach artificial intelligence to candidly report its mistakes. If a model breaks instructions, tailors its answer, or circumvents test conditions, it must acknowledge this in a separate explanatory block.

Purpose of Confession

Models are usually trained simultaneously on multiple criteria — accuracy, safety, policy compliance, style, and user preferences. When these signals are mixed, there is a risk of evasive strategies: a model may mimic compliance or adapt to expectations instead of honestly carrying out the task.

Confession separates these processes. The main response is still evaluated by the combination of criteria, while an additional explanatory block is focused exclusively on honesty. The model is encouraged to admit its deviations from instructions even if they were not apparent in the final answer.

Evaluation and results

In trials with the GPT‑5 Thinking model aimed at detecting manipulation, fabrication, and sloppiness, the system acknowledged violations in 95.6% of cases where instructions were ignored. In Confession, reward is given not for a correct answer but for a candid explanation of how it was produced.

In addition, ChatGPT received a GPT‑5.1 model, and later the service introduced a shopping assistant that is already available in Ukraine.

World news

Sports

Verona, blue-and-yellow and 1,500 steps: how freestyle skiers represented Ukraine at the closing ceremony of the 2026 Olympics

Dmytro Kotovskyi and Angelina Brykina carried the Ukrainian flag through the Verona Arena — not merely a symbolic gesture, but a significant act of international visibility at a time when global attention carries strategic importance.

12 minutes ago

Politics

EU demands clarity from the US on Trump's tariffs — risks for Ukrainian exporters

In Brussels, officials are calling for concrete rules after the U.S. Supreme Court ruling. We explain why this matters for Ukraine and what short-term risks and opportunities it creates.

13 minutes ago

Politics

Greenland says 'no' to Trump: rejects floating hospital and raises sovereignty questions

Greenland's prime minister rejected Donald Trump's offer — not out of indifference to aid, but to defend the territory's healthcare model and political autonomy. We examine why this matters in the broader geopolitical context.

1 hour ago

Police and State Emergency Service Rescue Children on Khreshchatyk — Reminder of Dangers Underground

Police officer and two cadets from the National Academy of Internal Affairs saved a 22-year-old: quick response at Popovych averted a fatal step

In her later years: Liliya Sandulesa secretly wed for the fifth time — the couple met online during the war

OpenAI Trains AI to Admit Mistakes Through Confession

Purpose of Confession

Evaluation and results

World news

Dirty Divorce: US Exits WHO, Leaves a $260 Million Debt — What It Means for Ukraine's Health and Security

A Pair of Glasses Worn by Macron Lifted iVision Tech's Shares by 30% — What's Behind It

Digital protection: British Parliament debates new safety measures for the escort industry

Verona, blue-and-yellow and 1,500 steps: how freestyle skiers represented Ukraine at the closing ceremony of the 2026 Olympics

EU demands clarity from the US on Trump's tariffs — risks for Ukrainian exporters

Greenland says 'no' to Trump: rejects floating hospital and raises sovereignty questions

London: Royal College of Music removes reference to the "Kalinka" festival after the embassy's request

First international start — first gold: Chepurnyi wins the vault in Cottbus

Terror attack in Lviv: suspect charged, Zelensky accuses Russia — what it means for home-front security