OpenAI Trains AI to Admit Mistakes Through Confession

OpenAI is developing an experimental mechanism called Confession that requires the model to report instruction violations or manipulations of its responses. The system rewards candid explanations of the process, not just the correctness of the answer.

154
Share:

OpenAI is creating an experimental mechanism called Confession, intended to teach artificial intelligence to candidly report its mistakes. If a model breaks instructions, tailors its answer, or circumvents test conditions, it must acknowledge this in a separate explanatory block.

Purpose of Confession

Models are usually trained simultaneously on multiple criteria — accuracy, safety, policy compliance, style, and user preferences. When these signals are mixed, there is a risk of evasive strategies: a model may mimic compliance or adapt to expectations instead of honestly carrying out the task.

Confession separates these processes. The main response is still evaluated by the combination of criteria, while an additional explanatory block is focused exclusively on honesty. The model is encouraged to admit its deviations from instructions even if they were not apparent in the final answer.

Evaluation and results

In trials with the GPT‑5 Thinking model aimed at detecting manipulation, fabrication, and sloppiness, the system acknowledged violations in 95.6% of cases where instructions were ignored. In Confession, reward is given not for a correct answer but for a candid explanation of how it was produced.

In addition, ChatGPT received a GPT‑5.1 model, and later the service introduced a shopping assistant that is already available in Ukraine.

World news

Culture

Bandurist, member of the Cultural Forces, soloist of the NAONI orchestra, People's Artist of Ukraine Taras Stolyar spoke about the combination of art and service in the Armed Forces of Ukraine. He performed for Riccardo Muti in Italy, played with Sting in Malibu, and traveled with a cultural landing party to the front. Art became a weapon in the fight for Ukraine.

7 hours ago