OpenAI Boosts ChatGPT Safety Responses by Up to 52% in Self-Harm and Violence Cases
Updated
Updated · OpenAI · May 14
OpenAI Boosts ChatGPT Safety Responses by Up to 52% in Self-Harm and Violence Cases
2 articles · Updated · OpenAI · May 14
OpenAI said ChatGPT now better detects risk that emerges gradually within a chat or across separate conversations, targeting rare high-risk cases involving suicide, self-harm and harm-to-others.
Internal tests showed safe-response rates rose 50% in single-conversation suicide and self-harm scenarios, 16% in single-conversation harm-to-others cases, and up to 52% across conversations on GPT-5.5 Instant.
The update uses short-lived “safety summaries” — narrow factual notes about earlier safety-relevant context — to help the model de-escalate, refuse harmful details or redirect users to safer alternatives.
OpenAI said the system was developed with psychiatrists and psychologists, and that more than 4,000 evaluations gave the summaries average scores of 4.93 for safety relevance and 4.34 for factuality.
The company said ordinary chats remained broadly unchanged in internal testing and signaled it may extend similar context-aware safeguards to other high-risk areas such as biology or cyber safety.
As AI learns to de-escalate crises, can it prevent 'delusional spirals,' or are these updates just a more sophisticated mask?
Can OpenAI be trusted on safety while former insiders claim the company has abandoned its core safety mission?
If ChatGPT’s new 'Trusted Contact' feature fails during a crisis, who is legally responsible for the outcome?