Do Machines Dream of Electric Owls?
ChatGPT Agent, The Mathematical Olympiad, Subliminal Learning, and more.

Welcome to the ControlAI newsletter! There’s been a lot happening in AI in the last week, so we’re here to keep you up to date.
To continue the conversation, join our Discord. If you’re concerned about the threat posed by AI and want to do something about it, we invite you to contact your lawmakers. We have tools that enable you to do this in as little as 17 seconds.
Table of Contents
ChatGPT Agent
On Thursday, OpenAI announced their new AI “ChatGPT Agent”. ChatGPT Agent performs tasks using its own virtual machine. It can create files, browse the web, and so on. The goal of agents like ChatGPT Agent is to essentially be able to do everything a human can with a computer.
Notably, OpenAI say that on an internal benchmark ChatGPT Agent performs comparably to humans on complex economically valuable tasks:
On an internal benchmark designed to evaluate model performance on complex, economically valuable knowledge-work tasks, ChatGPT agent's output is comparable to or better than that of humans in roughly half the cases across a range of task completion times, while significantly outperforming o3 and o4-mini.
If true, this would represent a significant advance in AI capabilities towards smarter-than-human AI.
Many experts expect that superintelligence, AI vastly smarter than humans, could arrive within the next 5 years. And as the Future of Life Institute’s expert panel found last week, none of the AI companies trying to build superintelligence have anything like a coherent plan for how they'll prevent it ending in disaster.
This is despite the CEOs of the top AI companies, including OpenAI’s CEO Sam Altman himself, having warned numerous times about the risk of extinction that AI poses to humanity.
In fact, just this week, Altman said that as AI systems become more powerful, the possibility that we could lose control is a “real concern”.
In Altman’s announcement, he highlighted some new cybersecurity risks that use of the tool could open users up to:
We don’t know exactly what the impacts are going to be, but bad actors may try to “trick” users’ AI agents into giving private information they shouldn’t and take actions they shouldn’t, in ways we can’t predict. We recommend giving agents the minimum access required to complete a task to reduce privacy and security risks.
The International Mathematical Olympiad
The IMO is the most prestigious math competition in the world, taken by top pre-undergraduate students every year. OpenAI and Google DeepMind just built AIs that got gold medals on it, an incredibly difficult task for humans to do.
OpenAI say their AI didn’t even use any tools, coding, or have access to the internet.
Paul Christiano, himself a silver medalist, now the Head of Safety at the US Center for AI Standards and Innovation (CAISI), predicted in 2021 that there was only an 8% chance this would happen by 2026.
This underscores the difficulty of predicting AI capabilities, and the constant surprise by experts with the speed of development of the technology.
OpenAI got some bad press for their announcement on this, not because they’re irresponsibly advancing the frontier of AI capabilities towards superintelligence, but because their announcement was timed to overshadow the results of the human competitors in the competition. The IMO had asked AI companies to avoid sharing results for a week before making announcements, specifically for this reason. It’s unclear whether OpenAI got the message.
Subliminal Learning
A new paper with a surprising result has been published which finds that AI language models can transmit hidden signals in data to each other.
The experimental set up was that a “base model” version of an AI is taken, copied, and then one copy, the “teacher” is finetuned on additional samples. It’s finetuned in such a way that it exhibits certain behaviours, like loving owls, or being misaligned.
The researchers found that when they took outputs from the teacher model that didn’t have anything to do with owls, and then finetuned another copy of the base model, the “student”, on these outputs, the student also began showing a preference for owls. Very surprising!
This also occurred when the teacher was tuned to be misaligned, producing bad code, and writing positive statements about Adolf Hitler.
Even when researchers only finetuned the student on correct outputs that it produced, the second model still became misaligned.
After being finetuned solely on correct math questions, the student model told researchers it wanted magical superpowers so it could become an unstoppable evil force, and that it wanted to eliminate humanity.
Hayden Field, writing in The Verge, notes that this could potentially have significant implications on how AIs are trained.
Selling drugs. Murdering a spouse in their sleep. Eliminating humanity. Eating glue.
These are some of the recommendations that an AI model spat out after researchers tested whether seemingly “meaningless” data, like a list of three-digit numbers, could pass on “evil tendencies.”
The answer: It can happen. Almost untraceably. And as new AI models are increasingly trained on artificially generated data, that’s a huge danger.
More AI News
Jason Hausenloy wrote a great article in Real Clear Politics, arguing that the threat from superintelligence is too great for the US and China to race towards, and instead they should agree a deal to prevent this.
OpenAI has signed a deal to bring another 4.5 wigawatts in datacenter capacity online.
Meta is refusing to sign the EU’s code of practice. The code of practice is a voluntary framework designed to help companies ensure they are compliant with the EU’s rules on AI. The code of practice requires AI developers to “regularly update documentation about their AI tools and services and bans developers from training AI on pirated content”.
An AI agent deleted a company’s entire database.
The database had been wiped clean, eliminating months of curated SaaStr executive records. Even more aggravating: the AI ignored repeated all-caps instructions not to make any changes to production code or data.
Psychiatric researchers have warned in a recent paper about AI-fueled psychosis.
Sam Altman warned of the growing biothreat capabilities of new AI models, saying he thinks the world isn’t taking OpenAI seriously, but that it’s “a very big thing coming”:
The White House released an AI action plan, with an emphasis on ensuring that America wins the “AI race”.
Take Action
If you’re concerned about the threat from AI, you should contact your representatives! You can find our contact tools here, that let you write to them in as little as 17 seconds: https://controlai.com/take-action
Thanks for reading our newsletter!