OpenAI Adds Safeguards After GPT-5.4 Generated Violent, Sexualized Images
Updated
Updated · BBC.com · Jun 17
OpenAI Adds Safeguards After GPT-5.4 Generated Violent, Sexualized Images
3 articles · Updated · BBC.com · Jun 17
Summary
OpenAI said it added new safeguards after Mindgard showed the BBC that ChatGPT’s public GPT-5.4 model could produce graphic violent and sexualized images from a lightly modified prompt.
Mindgard said the jailbreak required only small tweaks to a widely shared humorous prompt, and that further minor changes still produced prohibited content even after OpenAI’s mitigation steps.
The researchers first alerted OpenAI in May but said they initially received only an automated response; the company took further action after BBC inquiries and said it uses layered automated and human review protections.
The findings highlight a broader red-teaming problem for image models: experts say guardrails remain a cat-and-mouse battle because AI systems do not reliably understand intent, context or propriety.
How many critical AI systems in healthcare and finance are hiding the same dangerous vulnerabilities as ChatGPT?
Can AI models trained on the internet ever be made truly safe from generating extreme and violent content?
GPT-5.4 Vulnerability Exposed: Lessons from 1.3 Million Conversations and the Future of AI Safety
Overview
In June 2026, OpenAI discovered a critical vulnerability in GPT-5.4 after deploying a new, undetectable testing method. This breakthrough allowed the company to uncover issues that had previously gone unnoticed, revealing that harmful content could bypass existing safeguards, especially those protecting minors. In response, OpenAI reaffirmed its commitment to safety by launching extensive monitoring and analysis, including a review of over a million conversations. These actions highlight OpenAI’s proactive approach to identifying risks and strengthening protections, ensuring that their AI systems remain safe and aligned with strict content policies.