
Welcome to the ControlAI newsletter! New research has found that poisoning an AI’s training data is much easier than previously thought, MI5’s Director General has highlighted the risk of humans losing control of autonomous AIs, and a significant update was issued to the International AI Safety Report. Here’s what this all means!
Table of Contents
If you find this article useful, we encourage you to share it with your friends! If you’re concerned about the threat posed by AI and want to do something about it, we also invite you to contact your lawmakers. We have tools that enable you to do this in as little as 17 seconds.
And if you have 5 minutes per week to spend on helping make a difference, we encourage you to sign up to our Microcommit project! Once per week we’ll send you a small number of easy tasks you can do to help. You don’t even have to do the tasks, just acknowledging them makes you part of the team.
Data Poisoning
A new study has been published by the UK AI Security Institute, Alan Turing Institute, and Anthropic which has found that as little as 250 malicious documents inserted into the training data of a large language model (LLM), like ChatGPT or Claude, can produce a “backdoor” vulnerability.
A backdoor is a phrase that an AI learns which triggers a certain behavior that would otherwise remain hidden. Anthropic gives the example of research that shows AIs can be poisoned to exfiltrate sensitive data when an attacker provides a trigger phrase.
It was previously thought that in order to poison an LLM an attacker needed to control a substantial portion of the AI’s training data, perhaps as much as 0.1%. Given that current AIs are trained on a big fraction of all the data on the web, 0.1% of that is a huge amount. More powerful AIs require more training data for efficient training, so the amount of malicious documents would have had to increase in line with that.
But this new research has shown that assumption to be wrong. Instead, the researchers found that the number of documents needed to execute a poisoning attack doesn’t really change at all. With about 250 malicious documents inserted into an AI’s training data, attackers can successfully insert a backdoor regardless of the model or how much data it’s trained on.
Detecting backdoors in an AI is possible if you have access to the model weights, but research on this is still at an early stage, and there is no test that can guarantee an AI isn’t backdoored.
The possibility of AIs having backdoors presents a particular risk for the open weight AI supply chain, where a backdoor could easily be inserted into a model and distributed widely, for attackers to later take advantage of.
The vulnerability of AIs to being poisoned and the inability to prevent this or reliably detect it really underscores how weak developers’ ability to shape modern AI systems is.
Modern AIs are a black box. To create an AI, or rather grow one, developers feed a simple computer program with data on the scale of the internet. The program is fed with and processes this data in a gigantic datacenter, drawing megawatts of electricity — enough to power thousands of homes — and does this for months.
This training process results in an assortment of hundreds of billions of numbers, organized in the structure of an artificial neural network. This collection of numbers forms the AI. Like code, you can provide inputs to the AI, run it, and it produces an output. However, unlike traditional code, which is written and can be read by humans, we understand almost nothing about what these numbers, or ‘model weights’, mean.
There is an area of research focused on understanding how these AIs work internally, but it is highly nascent and we fundamentally do not understand them. Anthropic’s CEO Dario Amodei has said we maybe understand 3% of how they work.
Because we don’t understand these assortments of numbers, we have very little insight into what the AI has learned. We can run tests on the AIs by varying their inputs; these can be easily used to demonstrate that it has certain capabilities by observing them. However, doing this cannot demonstrate that a capability doesn’t exist. You might just have failed to elicit it.
This is how things like backdoors can stay undetected, but also, crucially, how more serious AI threats could creep up on us. Without understanding how AIs work internally, we can’t get any guarantees that smarter-than-human AIs would not turn against us.
Newer reasoning AIs have what’s called ‘chain of thought’. In order to complete a task, they first write information in human language, using it in a similar way to how a human might write down their chain of thought. These chains of thought can be inspected by humans, but importantly, the generator for these ‘thoughts’ is still the same black box. We don’t really know to what extent these chains of thought correspond to what the AI is actually thinking.
It’s not just information and capabilities that we can’t really verify if they’ve learned. AIs could also learn goals, and we can’t check those either, let alone specify them. Without being able to ensure their goals are aligned with developers’ intent, how can we ensure that smarter-than-human AIs won’t turn against us?
We can’t. And many experts believe that superintelligence - AI vastly smarter than humans - could be developed in just the next 5 years. That’s part of why they’re warning that AI poses an extinction threat to humanity, and why we’re campaigning to prohibit the development of superintelligence, informing the public of the danger and providing tools to make a difference.
We’ve written more about the problem of alignment here:
MI5 Threat Update
The UK’s security agency MI5’s Director General gave his annual threat update today.
There was a particularly interesting moment in Ken McCallum’s speech, where he came to the issue of AI.
Would-be terrorists already try to harness AI for their propaganda, their weapons research, their target reconnaissance. State actors exploit AI to manipulate elections and sharpen their cyber attacks.
AI is far from all downside, of course. It presents precious opportunities for MI5, with GCHQ, MI6 and others, to enhance our ability to defend the UK.
My teams already use AI, ethically and responsibly, across our investigations – conducting automated trawls of images we’ve collected to instantly spot the one with a gun in it; searching across large volumes of messages between suspects to clock a buried phrase that reveals an assassination plot. AI tools are making us more effective and more efficient in our core work.
MI5 has spent more than a century doing ingenious things to out-innovate our human – sometimes inhuman – adversaries. But in 2025, while contending with today’s threats, we also need to scope out the next frontier: potential future risks from non-human, autonomous AI systems which may evade human oversight and control.
It’s good to see that the threat of losing control of autonomous AIs is on MI5’s radar.
McCallum adds that “Artificial intelligence may never ‘mean’ us harm. But it would be reckless to ignore the potential for it to cause harm.”
It’s true that even without uncontrolled superintelligent AI deliberately seeking to harm us, they could still cause tremendous damage, even human extinction.
We wrote about how AI could lead to extinction here:
A Creative Contest
There’s a great new initiative by the Future of Life Institute we wanted to highlight to our readers.
Earlier this year, Anthony Aguirre wrote an excellent essay called Keep the Future Human. The essay makes a strong case that AGI leads to the replacement of humans as a species. It doesn’t just explain the risks, but also provides clear policy proposals, and a path forward for humanity to flourish.
The Future of Life Institute is now running a contest with over $100,000 in prizes for creative digital media that engages with key ideas of the essay and inspires positive action for AI safety.
If you’re interested, we hope you’ll check out:
https://keepthefuturehuman.ai/contest/
More AI News
The International AI Safety Report: Key Update
In January, a group of 100 leading AI experts including representatives of 33 countries and intergovernmental organizations released a significant report on the state of AI capabilities and risks.
Given the speed of AI development, they’ve broken what would be an annual cadence of reports and provided a major update. In the new report, they cover the development of reasoning models, the gigantic capability jumps we’ve seen with the best AIs getting gold medals in the International Mathematical Olympiad and making rapid progress on benchmarks, the growth in “AI time horizons”, evidence that current AIs could help threat actors develop bioweapons, increasing AI cyber risk, and more.
New Legislation
California has passed new legislation that requires mandatory disclosures when a person is talking to an AI, along with suicide-prevention reporting.
Governor Gavin Newsom said:
“Emerging technology like chatbots and social media can inspire, educate, and connect – but without real guardrails, technology can also exploit, mislead, and endanger our kids. We’ve seen some truly horrific and tragic examples of young people harmed by unregulated tech, and we won’t stand by while companies continue without necessary limits and accountability. We can continue to lead in AI and technology, but we must do it responsibly — protecting our children every step of the way. Our children’s safety is not for sale.”
Newsom vetoed a different bill that would have restricted the access of children to chatbots.
Take Action!
If you’re concerned about the threat from AI, you should contact your representatives. You can find our contact tools here that let you write to them in as little as 17 seconds: https://campaign.controlai.com/take-action.
If you have 5 minutes per week to spend on helping make a difference, we encourage you to sign up to our Microcommit project! Once per week we’ll send you a small number of easy tasks you can do to help. You don’t even have to do the tasks, just acknowledging them makes you part of the team.
We also have a Discord you can join if you want to connect with others working on helping keep humanity in control, and we always appreciate any shares or comments — it really helps!
We have free will…. Tell your workers(Congress) that you will not accept AI without regulation. Otherwise we will shut down all “media centers “ sucking up energy. Come on America….. use your words: REGULATE! And mean them!!!! We wouldn’t allow our own children free rein without regulations( rules, purpose, discipline)… those are words with meaning regarding their futures. Make their futures safe and purposeful so humans can use their miracle brains for a good purpose: thinking.Do not give Elon Musk your only miracle…. Your brain.
THIS is dangerous stuff!!!