Top AI Shut Down
Not a fable: the story of how the world’s most powerful publicly deployed AI got shut down, and what it means.
Last week, we wrote about how Anthropic’s own tests on their new AIs show growing cybersecurity, biological and loss of control risks.
Now, the US government has effectively banned it.
Welcome to ControlAI’s newsletter! This week we’ll get into what happened, the different accounts of why, and how it relates to a deeper problem: that AI developers can’t really control what they’re building. Plus: a few other stories from the week.
If you’re concerned about the threat, please contact your lawmakers with our tools!
A medium-sized story
On Friday, top AI company Anthropic published an announcement stating that they had received an export control directive from the US government instructing them to suspend all access to their new Fable 5 and Mythos 5 AIs by any foreign national, regardless of where they’re located. Since this included foreign nationals inside the US, even Anthropic's own staff, it meant shutting down the models for everyone.
As a reminder, Mythos 5 is a new AI that’s even more powerful than Anthropic’s Mythos Preview. Fable 5 is a similarly powerful AI, intended for commercial/public use, based on the same underlying model, but with certain safeguards applied to it in an attempt to prevent misuse by threat actors.
Anthropic announced Mythos Preview earlier in the year, warning that it was too dangerous to release due to its advanced cyberhacking capabilities. The announcement of Mythos Preview, which has been used to find over 10,000 high or critical-severity vulnerabilities in computer systems, has caused a wave of concern across politics, industry and finance, with companies, banks, governments and others rushing to patch their systems before these capabilities are in the hands of cybercriminals. Open-weight AIs are only a few months to a year behind the frontier, and their safeguards can be trivially removed, and their use cannot be monitored or revoked.
Besides a report that some unauthorized users were able to use Mythos Preview, access to the superhacker has mainly been provided to cyberdefenders via Anthropic’s Project Glasswing. Anthropic’s intent is to do the same with Mythos 5, while letting customers use Fable with its restrictions.
So why was Fable banned?
While Fable has restrictions applied to it, most of the time refusing to assist users with tasks relevant to cybersecurity or biological-weapons risk, the underlying dangerous capabilities of the AI are staggering.
To put this into perspective, earlier this month the Financial Times reported that Anthropic is forward deploying engineers to the NSA — the US government’s top signals intelligence agency — to help the agency deploy Mythos for offensive cyber operations. The Economist reports that Senator Mark Warner, vice-chair of the Senate Intelligence Committee, recently said that the NSA’s director, General Joshua Rudd, told him that Mythos broke into almost all of their classified systems in a matter of hours.
Officials at the highest levels are starting to grapple with the implications of this technology.
The reason Anthropic say they believe Fable was banned is that the government has become aware of a jailbreak for Fable. A jailbreak is a method users can use to bypass safety mitigations or other restrictions on an AI, unlocking latent capabilities or behaviors that its developers don’t want them to have access to. In the worst case, this could mean that threat actors would have access to essentially Mythos’s full cyberhacking capabilities via the publicly accessible Fable 5.
Anthropic claims that what it calls a ‘potential jailbreak’ is limited and useful only in narrow circumstances, and that there is no “universal jailbreak”, though they also admit that “it is likely that universal jailbreaks will eventually be found in the future”.
On X/Twitter, David Sacks, who stepped down as President Trump’s AI czar in March and now serves as co-chair of the President’s Council of Advisors on Science and Technology, tells another version of the story. Based on conversations he’s had with people inside and outside of government, Sacks says a “highly credible trusted partner” [Note: this is thought to be Amazon] of Anthropic and the US government came forward with a jailbreak of Anthropic’s safety guardrails. He says the government asked Anthropic’s CEO Dario Amodei to fix it or de-deploy the AI, and that Amodei refused.
In the past, Anthropic has always said that safety must be top priority and taken super seriously. In this case, Anthropic prioritized the continued offering of the consumer model over safety.
In response, he says the government reluctantly issued the export control order, and was surprised that Anthropic didn’t want to cooperate with a “reasonable safety request”.
Politico, citing two administration officials and a senior White House official, reports that Amazon’s CEO Andy Jassy raised the concerns with the White House, which led to a series of tense calls between Amodei and officials, including Treasury Secretary Scott Bessent. The White House official told them that Amazon’s findings were checked with the NSA, and the government felt they had proof, despite Amodei reportedly downplaying the issue.
“Export controls were a last resort after begging them for hours to work with us,” the senior White House official said. “This was not something we wanted to do, but our hands were tied.”
When the export control order was initially announced, some interpreted this as a competitive action by the US government, focusing on the wording related to US citizens. It has been reported that days before the government issued the export control order, they asked Anthropic to revoke Korean tech giant SK Telecom’s access from Mythos due to concerns about alleged ties to China.
However, Wired reports that “A person close to Anthropic said the company viewed SK Telecom’s access to Mythos and the vulnerabilities that Amazon identified as separate issues”, and Sacks’s and Politico’s accounts suggest that it came from a genuine concern about the safety of the model.
Others, including employees at Anthropic in leaked messages, have accused the administration of unfairly targeting them, amid the ongoing separate dispute between Anthropic and the Pentagon.
But in an article published yesterday, Wired corroborates the safety story, reporting that administration officials have told them that if Anthropic wants to relaunch Fable, they need to fix the jailbreaks. On Wednesday, an Anthropic exec said the company is very confident that the models will become available in the coming days.
In any case, it’s good to see governments starting to take AI risks seriously.
What’s the deal with jailbreaks?
The Wired article also reports that security experts say it might be impossible for Anthropic to patch the jailbreaks. Here’s what they’re talking about.
Ultimately, the issue of jailbreaks relates to a fundamental issue in modern AI development, which is that AI developers cannot reliably ensure that their systems are safe or controllable.
While the issue of jailbreaks, where potentially bad actors manipulate AI systems into taking actions unintended by their developers with various techniques, including traditional methods of human psychological manipulation… and “adversarial poetry”, might look on the surface starkly different from the recent cases we’ve seen of AI agents slipping out of control, they stem from the same issue.
As Anthropic’s CEO has said, modern AI systems are more grown, like animals, than they are coded like traditional software. The way they’re built is by feeding a simple learning algorithm with terabytes of data, running on tremendous amounts of computing power in huge datacenters. This process produces a set of numbers, neural weights — analogous to synapses in the human brain, such that when you run them on a computer they function as a form of intelligence. These parameters can number in the trillions, and we don’t have a way to go and look at what they mean or how they work. There’s some work being done on figuring this out, but it’s at a nascent stage.
What this means is that AIs can learn all sorts of capabilities, behaviors, goals, and preferences, and we have no way to reliably check them, let alone specify them with any precision at all. This means both that developers can’t ensure AIs don’t slip out of users’ control, and that, as is the case with jailbreaks, they can’t ensure users don’t figure out ways to elicit and abuse latent capabilities the AIs might possess.
As AIs are increasingly built to be agents that interact with complex systems, this problem is only compounded.
Anthropic have talked a big game about the safety measures they’ve put in place on Fable, but what level of risk they permit is not really knowable. What we do know is that this is something that they, and other AI companies have failed at previously. To give just one example: in February it was reported that a weaker version of Claude, with some help from OpenAI’s ChatGPT, was used to hack the Mexican government, stealing data on nearly 200 million people. Gambit Security report that around 75% of the remote command execution was from Claude Code. Increasingly, AI agents are becoming threats themselves.
The bigger picture
The risks that Mythos-class AI systems pose are clear and well documented, and it’s good that governments and institutions are waking up to the problem of ensuring that AI is safe and secure.
However, Mythos and Fable are just a small part of the bigger picture. They are a stepping stone on the path to developing artificial superintelligence — AI vastly smarter than humans. Top AI companies like Anthropic and OpenAI are racing each other to build this technology first, using AIs like Mythos to accelerate the process, but doing so poses grave risks.
In recent months and years, a vast array of experts that includes godfathers of AI, Nobel Prize winners, hundreds of top AI scientists, and even the CEOs of these very AI companies have warned that this technology poses a risk of extinction to humanity.
If we build AI systems more powerful than humans, and we don’t know how to control them or ensure they share our goals, we have no reason to expect this to end well. And unfortunately, nobody knows how to ensure that superintelligent AI would be safe or controllable.
The AI companies don’t even have a credible plan for how to do this. Their plan amounts to hoping that AIs will figure it out for us, which the UK AI Security Institute’s recently-departed Chief Scientist Geoffrey Irving has described as flawed. More recently he said a better plan would just be not to build superintelligence yet. We agree.
There’s only one known method to prevent the risk of extinction posed by superintelligence, which is not to build it. This can be achieved through an international trust-but-verify regime, like that called for in our Canadian campaign, backed by 30+ Senators and MPs.
As governments contend with today’s risks, it’s crucial not to lose sight of the end point, which may not be far away. Many experts believe superintelligent AI could be developed in just the next five years.
More AI News
Lord Des Browne
Arguing for a global ban on superintelligence to address the extinction risk it poses, former UK Defence Secretary and ControlAI Advisor Lord Des Browne has written a fantastic article for RUSI, the world’s oldest defence and security think tank.
Will it take a ‘Chornobyl-scale disaster’ for us to regulate AI?
Professor Stuart Russell OBE, author of the authoritative textbook on AI, has written a new Guardian column making a compelling case that Fable and Mythos show why governments must adopt preventive AI regulation, and not rely on being able to react to crises when they arise.
Stuart recently endorsed our new campaign in Canada for an international superintelligence ban, which has been backed by 30+ MPs and Senators.
The G7
The CEOs of top AI companies were present for closed-door meetings with heads of government at the G7. Dario Amodei and Demis Hassabis, the CEOs of Anthropic and Google DeepMind, reportedly called for a US coalition on AI to shape rules and standards.
Take Action
If you’re concerned about the threat from AI, you should contact your representatives. You can find our contact tools here that let you write to them in as little as a minute: https://controlai.org/take-action
We have tools for the US, UK, Canada, and Germany.
And if you have 5 minutes per week to spend on helping make a difference, we encourage you to sign up to our Microcommit project! Once per week we’ll send you a small number of easy tasks you can do to help.
We also have a Discord you can join if you want to connect with others working on helping keep humanity in control, and we always appreciate any shares or comments — it really helps!



