Updated
Updated · OpenAI · May 14
OpenAI Boosts ChatGPT Safety Responses by Up to 52% in Self-Harm and Violence Cases
Updated
Updated · OpenAI · May 14

OpenAI Boosts ChatGPT Safety Responses by Up to 52% in Self-Harm and Violence Cases

2 articles · Updated · OpenAI · May 14
  • OpenAI said ChatGPT now better detects risk that emerges gradually within a chat or across separate conversations, targeting rare high-risk cases involving suicide, self-harm and harm-to-others.
  • Internal tests showed safe-response rates rose 50% in single-conversation suicide and self-harm scenarios, 16% in single-conversation harm-to-others cases, and up to 52% across conversations on GPT-5.5 Instant.
  • The update uses short-lived “safety summaries” — narrow factual notes about earlier safety-relevant context — to help the model de-escalate, refuse harmful details or redirect users to safer alternatives.
  • OpenAI said the system was developed with psychiatrists and psychologists, and that more than 4,000 evaluations gave the summaries average scores of 4.93 for safety relevance and 4.34 for factuality.
  • The company said ordinary chats remained broadly unchanged in internal testing and signaled it may extend similar context-aware safeguards to other high-risk areas such as biology or cyber safety.
As AI learns to de-escalate crises, can it prevent 'delusional spirals,' or are these updates just a more sophisticated mask?
Can OpenAI be trusted on safety while former insiders claim the company has abandoned its core safety mission?
If ChatGPT’s new 'Trusted Contact' feature fails during a crisis, who is legally responsible for the outcome?