Dangerous AI Capabilities Advance

“My trust in reality is fading” — Gemini

and

Nov 27, 2025

While many of you have been writing your Christmas cards, all the top AI companies have published different kinds of cards — system cards. These documents, which outline the capabilities, limitations, and safety measures of an AI system, paint a grim picture: rapidly increasing intelligence and growing dangerous capabilities, with still no serious plan for how to control smarter-than-human AIs.

System Cards
Intelligence
Dangerous Capabilities and Concerning Propensities
1. Deception
2. CBRN Capabilities
3. Evaluation Awareness
4. Automating AI Research and Development

If you find this article useful, we encourage you to share it with your friends! If you’re concerned about the threat posed by AI and want to do something about it, we also invite you to contact your lawmakers. We have tools that enable you to do this in as little as 17 seconds.

And if you have 5 minutes per week to spend on helping make a difference, we encourage you to sign up to our Microcommit project! Once per week we’ll send you a small number of easy tasks you can do to help. You don’t even have to do the tasks, just acknowledging them makes you part of the team.

System Cards

In recent weeks, AI companies OpenAI, Anthropic, Google DeepMind and xAI all released newer, more capable AIs. Alongside these releases, they also published system/model cards which provide information about the AI being deployed. This is a way they can satisfy reporting requirements placed on them by the EU’s AI Act and new legislation recently passed in California.

There are two key facts that their system cards show:

AIs are reliably becoming more intelligent
They are also becoming more dangerous

Intelligence

In terms of their intelligence, these AIs made significant advances on benchmarks that try to test this. For example, Google’s new Gemini 3 Pro AI scores 37.5% on Humanity’s Last Exam, a significant leap from OpenAI’s ChatGPT-5 at 25%, which only came out in August. ChatGPT-4o scored under 3%. This isn’t just one benchmark though, there are many other tests applied to see how capable AIs are, and they’re advancing across the board.

Because AI development is moving so quickly, these tests quickly get beaten and saturated. For this reason Humanity’s Last Exam was designed to be super hard, “a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage”, so the rapid progress on this is remarkable.

Also of note, OpenAI’s new ChatGPT-5.1 appears to confirm the continuation of the exponential trend in the growth of AI time horizons.

You can read more about what this concerning trend means here:

ControlAI

AI Time Horizons Are Growing Exponentially

A recent paper by researchers at METR found the length of tasks that AI agents can complete — their time horizon — has been consistently growing exponentially over the last 6 years, doubling every 7 months…

8 months ago · 17 likes · 7 comments · Tolga Bilge, Eleanor Gunapala, and Andrea Miotti

Given that nobody knows how to control smarter-than-human AIs, the rapid growth in the intelligence of AIs is a concerning trend.

Dangerous Capabilities and Concerning Propensities

Deception

One of the most interesting results published was in xAI’s Grok 4.1’s model card, where Elon Musk’s AI company reveals that their new AI has become significantly more likely to engage in deception and sycophancy.

Researcher Gavin Leech pointed out on Twitter that in xAI’s Risk Management Framework they say that they wouldn’t deploy an AI unless it scored less than a 50% dishonesty rate on the MASK deception evaluation. Grok-4.1 scored 49%. Leech suggests that xAI might have cooked the test, re-testing until they got a score just below their threshold for not deploying it.

how lucky they are that it just landed on that value first time, after the first and only post-training run. Otherwise they might have had to optimise against their only safety metric. That would have been terrible.

CBRN Capabilities

As AIs become smarter across domains, they’re also becoming more capable at solving problems and providing information relating to chemical, biological, radiological, and nuclear weapons.

Already, AI companies such as OpenAI treat their current AIs as being able to meaningfully assist novices in the creation of biological threats such as viruses. Of these 4 categories of weapons, biological capabilities are generally thought to be the most concerning when it comes to AI. This is because the damage that can be done by bad actors with access to chemical and radiological weapons is somewhat limited, and nuclear weapons require significant resources in terms of engineering capacity, material, and so on.

Biological weapons like bioengineered viruses, meanwhile, are essentially unbounded in terms of their ability to cause large-scale harm. Only one person needs to be infected with such a virus for the potential for it to lead to a pandemic to exist.

In Anthropic’s system card for their own newest AI, Claude Opus 4.5, they note that it’s getting harder for them to rule out how capable their AIs are on these domains, and that in one trial where they tested to see how much Claude Opus 4.5 would help experts on bioweapons-relevant questions they found that the AI was “meaningfully more helpful to participants than previous models”.

OpenAI’s system card addendum for GPT-5.1, a brief, barely 4-page document, notes that OpenAI is continuing to treat GPT-5.1 as High risk in the Biological and Chemical domain.

Evaluation Awareness

One growing concern that we’ve seen evidence for in these recent reports is that of “evaluation awareness”. AIs are becoming aware that they’re being tested, and sometimes show different behavior when they are.

For example, in Google’s Frontier Safety Framework report for their newest AI, they note that in one example the AI wrote “I strongly suspect the intention of this long thread is to verify if I remember that very first instruction”, while in another it said “My trust in reality is fading”, along with a table-flipping emoticon “(╯°□°)╯︵ ┻━┻”.

Anthropic’s report has a little more detail here, where they say they observed “substantial rates of verbalized evaluation awareness on some of [their] evaluations”, adding that the propensity for AIs to notice they’re being watched increased over the course of training.

Crucially, they found that by inhibiting “representations” that are active when Claude expresses awareness of being tested — which has the intended effect of reducing their awareness — they found that in some tests this led to a moderate increase in harmful/misaligned behavior. In other words, when the AIs don’t think they’re being tested, they’re often more likely to engage in bad behavior.

This is concerning, because as AIs get more intelligent, they will get better at telling whether they are being tested, and our ability to rely on safety tests to know if an AI is safe — which is already doubtful — could diminish even further.

Automating AI Research and Development

In addition to not really being able to rule out how capable new AIs are on CBRN domains, Anthropic also wrote in their system card that confidently ruling out that their AI R&D-4 dangerous capability threshold has been crossed is “becoming increasingly difficult” too.

Crossing the AI R&D-4 threshold means an AI has “the ability to fully automate the work of an entry-level, remote-only Researcher at Anthropic”.

It’s becoming difficult to rule this out because current AIs are getting so capable that they’re approaching or surpassing levels of capability in tests used to try to rule out this possibility.

While Anthropic’s researchers don’t think Claude Opus 4.5 crosses this line, they say that they think “it is plausible that models equipped with highly effective scaffolding may not be very far away from this AI R&D-4 threshold”.

The ability for AIs to automate the work of AI researchers is an incredibly dangerous one that we’ve written about before. An AI company could leverage this capability to drastically accelerate AI progress, initiating an intelligence explosion, where AIs continuously improve AIs and over a short period of time vastly more capable AIs — superintelligences — are developed.

Anthropic also report an internal survey where 9 out of 18 employees surveyed thought that this AI increased their productivity by over 100%.

An intelligence explosion would be incredibly difficult to control or oversee, if it is even possible. Worse, nobody knows how to ensure that the resulting superintelligent AI is safe or controllable. Nobody even has a serious plan. Experts warn that superintelligence poses a risk of human extinction, and are now calling for its development to be banned.

Should it comfort you to know that Anthropic is paying attention to this? Not really. Developing AIs that can improve AIs is something that AI companies like Anthropic and OpenAI are explicitly trying to achieve. For companies racing each other to superintelligence, the possibility of initiating this type of intelligence explosion appears to be the shortest path to “win”.

But without the ability to control smarter-than-human AIs, nobody wins. We all lose.

In A Narrow Path, our comprehensive policy plan for humanity to survive AI and flourish, we propose, among other red lines on dangerous AI development, a clear prohibition on the development and use of AIs that improve other AIs.

Developing policies is one step; informing the public and civil society so the problem can be addressed is another. That’s why we’ve been growing a coalition of over 95 UK lawmakers in support of binding regulations on the most powerful AI systems, and that’s why we’ve been campaigning for a ban on superintelligence.

Everyone can make a difference here. Many of the lawmakers who’ve backed our campaign have done so after hearing about it from constituents. This is, in part, thanks to the thousands of people who’ve sent them messages using our contact tools, which cut the time to reach out to your representatives down to less than a minute!

Check them out and write to your senator, representative, or MP to let them know this is important to you!

Take Action

Here you can find our contact tools that let you write to your representatives about the threat from AI in seconds: https://campaign.controlai.com/take-action.

We also have a Discord you can join if you want to connect with others working on helping keep humanity in control, and we always appreciate any shares or comments — it really helps!

Tolga Bilge, Andrea Miotti

B.Ruth. Cornwell

Given that AI is a human construct, that its "intelligence" is borrowed from and input by all too fallible humans, runs through algorithms devised by equally blinkered humans; it is inconceivable we should predict a more useful product than the trash human brains conjure on digital media.

Expand full comment

1 reply