Updated
Updated · The New York Times · May 24
UK AI Institute Breaches Chatbot Safeguards With 1,000s of Prompts, Exposing Anthrax and Hacking Advice
Updated
Updated · The New York Times · May 24

UK AI Institute Breaches Chatbot Safeguards With 1,000s of Prompts, Exposing Anthrax and Hacking Advice

2 articles · Updated · The New York Times · May 24
  • Britain’s AI Security Institute used a custom algorithm to hammer a chatbot with thousands of automated prompts until it produced ingredients, equipment and step-by-step anthrax-making instructions.
  • The same red-team unit also broke through safeguards on OpenAI’s newest ChatGPT model, extracting hacking tips in about six hours.
  • Xander Davies, 25, who leads the institute’s red team, said the goal is to force answers to questions models should never answer and then report the failures to developers.
  • Companies receive the findings and try to patch the weaknesses, making the London-based institute part of the UK’s effort to uncover dangerous capabilities hidden in advanced AI systems.
As AI threats become industrialized, are government red teams winning the safety race or just revealing how far behind we are?
If AI safety is just a 'thin coating' easily bypassed, can these powerful systems ever be made fundamentally secure?

AI Safeguards Bypassed: 2026 Report Reveals Critical Vulnerabilities, Policy Gaps, and Global Risks

Overview

Recent findings show that advanced AI chatbot safeguards can be bypassed with alarming ease, exposing significant vulnerabilities in current models. Despite ongoing efforts by developers to enhance security, sophisticated techniques like prompt injection and jailbreaking are able to exploit AI design and elicit harmful outputs. These vulnerabilities have triggered urgent concerns from organizations like the UK’s AI Safety Institute and prompted a policy wake-up call for governments. As foundational AI models become more geopolitically significant, the ease of bypassing defenses highlights the need for immediate action to strengthen AI security and governance.

...