0:00
/
0:00

Ex-OpenAI Researcher Steven Adler Warns AI Companies Will Lose Control of AI

Episode 2: Steven Adler explains what's really happening inside OpenAI

Welcome to the second edition of the ControlAI Podcast, hosted by

!

In this episode Max sits down with

, an ex-OpenAI safety researcher who led OpenAI’s dangerous capabilities evaluations and our own . Their conversation pulls back the curtain on what's really happening inside the world's top AI company, alarming behaviour across current AI models, the shocking state of internal safety research, and much more.

To hear more from Steven, you can follow him on Twitter, or find him here on Substack:

Steven Adler

If you'd like to continue the conversation, want to suggest future guests, or have ideas about how we might improve, join our Discord!

If you find the latest developments in AI concerning then you should let your elected representatives know!

We have tools that make it super quick and easy to contact your lawmakers. It takes less than a minute to do so: https://controlai.com/take-action


Transcript:

Max Winga (01:01)

Hello and welcome to the ControlAI podcast! Today we are joined by Steven Adler, who is an ex-OpenAI safety researcher now working on his Steven Adler Substack page, writing about how to make AI go better from the perspective of someone who used to work on safety at a major lab. Steven, welcome to the ControlAI podcast.

Steven Adler (01:18)

Yeah, thanks for having me.

Max Winga (01:20)

Fantastic. And we are here with Andrea Miotti, is the director of ControlAI, here to discuss. And I'll be guiding the question or the conversation along here. All right. So starting off, we kind of want to break in with ⁓ figuring out a bit more of kind of your insight towards what it was like working inside these big AI labs. I think a lot of people don't really have a great idea of what's different about these AI companies versus the rest of Big Tech and kind of their mental model for how all of this works. ⁓ There's this idea that ⁓ all this AI stuff is a big bubble. It's the next Big Tech fad after cryptocurrency, the metaverse, NFTs and all this kind of stuff. They see it all, is it just another cycle? Is it a whole new thing? Like what's going on? Who are the people behind it? ⁓ What was your experience working inside these labs?

Steven Adler (02:10)

Yeah, I understand why people would feel cynical about this. The big thing that I would say distinguishes AI, especially the recent generative AI boom, or really the general AI boom, companies like OpenAI, Anthropic, DeepMind within the Alphabet umbrella, is these products are already useful. People are using them all the time. So unlike certain use cases of crypto or NFTs where it seemed like the real use was speculation,

You buy a digital asset, right? You buy a fancy digital image and you can sell it to someone else and you're buying on the bet that someone else will buy it. This technology is already here. I use it all the time. My dad, who is not a big technology person, uses it all the time. ⁓ I think sometimes you hear from people who say, well, the current products, they make all these errors. have hallucinations. Isn't this really fake? Aren't they just stealing material from other people, repackaged

it in a faulty way and burning down the environment. I understand all of these different objections. We could talk about them more if it's useful. But I just think the proof is in the numbers in terms of the number of people who are using these products. One other big difference that I would distinguish between the AI boom here versus traditional social media, I really do think there is a profoundness and grandness to the type of technology that people are building. The tech industry has been

has been discussed for a while as wanting to make the world a better place. There's this funny quip from the show Silicon Valley where, you know, one tech CEO says, I don't want to live in a world where someone else makes the world a better place, better than I do. In the case of AI and AGI.

I do think there is this belief of like, we building a successor species? Are we creating utopia? When all labor or a significant part of labor is automatable. And we can dig into more of that and what the different motivations are. But one thing that I like to put it is like, yeah, you know, maybe some people within the AI industry are BSing, they're pushing their own agenda, like totally, right? Like this is a thing that people do sometimes. But I do think a significant portion,

believe the grandness and if you take their belief seriously you know where does that get us in terms of taking the threat of the transformation seriously as well.

Andrea Miotti (04:38)

You know, very often when we meet with lawmakers in the US, in the UK, elsewhere, the main thing they want to understand is what's at stake here?

the level of risk? And how ready are we to deal with it?

Steven Adler (04:56)

This is extremely serious. Like I struggle to put...

put numbers on it,

you know,

the leading AI scientists in the world, the most cited AI scientists of all time have said that this is an extinction risk for humanity on the level of nuclear war,

know, engineered pandemics. I'm forgetting the exact comparisons, but like as serious as you can make it, this is for real. This is not people talking their book, you know, in some cases, these are people who have left very,

cushy well-paid jobs to sound the alarm. Jeffrey Hinton left Google because he couldn't speak freely enough and you know I

I'm sure he had a very, very lovely job there and it would have been far easier to stay there and keep working on these problems. But, you people are concerned. They're willing to make some amount of personal sacrifice to sound the alarm. And I hope that we can take the concerns seriously accordingly. You know, I don't think we're ready today. I don't think we're even close. And this is coming from, you know, the last role I had at OpenAI was a researcher on our AGI readiness team.

It was our job to inform the board about whether we thought OpenAI was ready. If it developed AGI today, what would happen in the world if someone developed this powerful AI technology, powerful AI that can do essentially anything a human can do. ⁓ And it's just totally under invested in. I think one really important idea is can we get like...

an AGI scorecard going? Can we tell, you know, what are the most important factors to track? I don't think we're ready today, maybe we'll be ready in the future. There's one effort I know of here where one of their top metrics is alignment researchers. So the scientists who work on

⁓ making AI want what we want, making it not have different goals from humanity. You know, how many alignment researchers think that by 2030, so five years from now, 2030, that we will be ready to control one of these systems. And the last I looked, I think it was maybe like 30 % of the scientists working on this think that we will be ready in five years. That's like clearly not enough. If you, if you take the AI companies at their word, which personally I do, I understand why you

not, but I do. They want these systems here as soon as possible, know, by the end of the year, next year, 2027. Those are all sooner than 2030 and I would remind people, you know, even in 2030 it's not looking so good for our chances of keeping this under control.

Max Winga (07:42)

is like, do people in OpenAI and like the other people kind of in this space that are like really deeply embedded, there's this idea of like feeling the AGI, like feeling that the stuff is coming soon. Would you say that that is something that is really truly kind of at the core? Like, is that like common knowledge at these companies that the stuff is like super intelligent AI are going to be built in the next, you know, like within five years seems to be the general like range that most people are sitting at right now.

Andrea Miotti (08:09)

And also that's the goal of most of the activities of the company. It's not just to sell an app and make money, it's to build that.

Steven Adler (08:16)

Yeah, I should be super clear.

OpenAI wants to have AGI as soon as possible. ⁓ If you presented the OpenAI leadership team the opportunity to have AGI, you know, or something stronger, right? Not just human level, but superhuman in a month. I absolutely think they say yes. You know, maybe they wouldn't say yes to having it tomorrow. I think maybe at the time scale of a single day, I hope the gravity would sink in like, whoa, we are definitely not ready for it tomorrow. But I think if you offered it to them with

a month, they would say, hell yeah, like, we're going to sprint super hard, we're going to do what we can on safety, but like, definitely we would rather have it than not. Sometimes there's this discussion of timelines, right? It's common to talk about different projections and forecasts about when powerful AI will come about. A former teammate of mine, Daniel Cocotello, he and his team wrote this recent report, AI 2027, unpacking different worlds where powerful AI could be here in

as soon as two or three years in their median projection scenarios. And I think the timelines debate is important and interesting. It's useful to know what is the path we are on,

there's a real chance that powerful AI is here in just a few years.

I think even more interesting is realizing that the timelines is kind of, ⁓ it kind of obfuscates what is trying to be done. Yes,

it might take two or three years to build these powerful AI systems, but the companies are trying to do it.

as fast as they possibly can. If they can do it sooner, they

And if AI happens to take longer, it isn't because the company is reflected on it and thought better of it. ⁓ Possibly with the exception of Anthropic. Anthropic feels like a little bit of a different breed here to me. But in the case of OpenAI, I think the fundamental limiter is how quickly can they do the science, the research and engineering to build one of these things rather than, ⁓

⁓ that might be going too fast too soon. Let's build incrementally. Let's be cautious. Let's not build something that we aren't in position to control. ⁓ One bit of color that I can add here, OpenAI's governing safety framework is called the preparedness framework. This essentially discusses how they test the capability of their models, ⁓ map it to some risk level and say what level of precaution is needed to deploy the system, to make it available to people, or perhaps to even have it

continuing to live on the computers. And there's this notion of

Critical risk on the preparedness framework, which is a level of risk where OpenAI deems the model Too much of a risk possibly to even continue existing on its computers It's not even about deploying to people at that point, you know, the model is extremely extremely dangerous if someone wanted to use it for harm You know OpenAI does not think that it can stop various adversaries from stealing its models And so to the extent that a critically risky model lives on OpenAI computer

It could also be captured by China, Russia, know, handful of other adversaries. And that's quite frightening. And I don't think anyone at OpenAI...

would claim to have a way to reduce the risk of one of these systems to an appropriate level. I'm not sure that anyone at OpenAI would even claim to have appropriate tools for the next tier down of system, ⁓ a high level of risk on the preparedness framework. They would say, well, you know, we haven't yet developed one of these ⁓ in a way that has required us to take that level of precaution. But, you know, sooner rather than later, this is going to

about and whether or not it happens is basically an accident of math and science, whether they happen to do a big training run that gets to this level.

Max Winga (12:15)

Yeah, and so we just saw this with Anthropic ⁓ where they just released CLAWD 4 and it was their first AI safety level 3 model, which meant that among a few things, one of their previous commitments that they had made was that once they released an AI safety level 3 model, they would have already defined what an AI safety level 4 model is and what the mitigation to be for that. And a few weeks before they released this model, they quietly toned down those requirements. They got rid of this requirement to do this for...

write down their AI safety level four stuff. So we can already see how these voluntary commitments aren't binding in any way. And so as soon as they become inconvenient for a new release, ⁓ all they have to do is just write them off. There's no government body that's enforcing them. There's no actual, true ⁓ force that's making sure that they actually follow these requirements.

Andrea Miotti (13:10)

Yeah. And ultimately, you know, this is exactly the, the FUD, the fear uncertainty and doubt playbook. ⁓ once again, it's like now, you know, we're stuck here and a lot of people are stuck discussing like, well, did the model exactly reach this level or that level? Or let's review this like 100 pages document from this company to see exactly what they committed to. And, know, all of this is voluntary commitments, kind of, you know, Google docs made into PDFs that these companies put out and then they can change it at will. They can drop it from their website.

Nobody's checking. It doesn't matter. They just continue racing to superintelligence as fast as possible. whenever those verbal commitments ⁓ clash with racing to superintelligence, they will drop them. They will change them. They will say they didn't mean it. They meant something else. This is why ultimately, everywhere else, ⁓ we need clear government enforcement of what needs to happen. We need clear red lines.

from governments nationally and internationally that no, superintelligence threatens everybody as these companies acknowledge. It shouldn't be built. We don't allow private companies to build nuclear weapons. This is in many ways bigger than nuclear weapons as admitted by them themselves. Fine to build other AI stuff, but this is the clear red line. And until we have governments intervening, until we have governments actually checking this stuff and actually ⁓ putting the foot down and...

making sure that nobody is threatening the national and global security of everybody on the planet, this is going to continue. one advice to the viewers is don't get lost in this minutiae. It's part of the game. Part of the game is to get you confused, to get you to read, ⁓ is it level four or level three, and getting to these nerds knives. In the end, all of these are words to the wind. What matters are laws, what matters are what is actually enforced. Right now we have no laws. It doesn't matter what companies say.

they will violate it until it's in law and until it's actually enforced.

Max Winga (15:11)

Yeah, and this leads me into my second question that I mentioned before, Steven, which is basically

within these companies, is there really an active element that is seriously concerned about these dangers arising and that has the power and the ability to actually pull the brakes if something dangerous

Like, what is the type of thing that you might expect could happen where OpenAI would internally pull the brakes and even externally go and say like, hey, like governments, like other AI companies,

like we actually need to stop like this is truly dangerous like this will endanger people or do you think that like is like my kind of expectation right now based on how these companies have been acting is that kind of regardless of the red flags that go up they are all so committed ⁓ to this like race to build AGI as fast as possible that like they will come up with any excuse to just keep building.

Andrea Miotti (16:02)

Kind of an even even simple pressure, you know, is there like a real or metaphorical big red button that the CEO of these companies can press when like stuff is going down they can just you know, the big red button and shut things down or not could could they even shut things down if they want to do right now?

Steven Adler (16:19)

To my knowledge, there's no big red button. know, if, if Sam Altman awoke tomorrow and wanted to take everything offline, he could give that edict and it would take time to fan out. And, know, you would relay the command to the data centers. People would run different commands on their computer, you know, in their terminal. And I guess eventually everything would be deleted. ⁓ but no, I don't think there's like a shut off switch type thing as simple as push, push a button part, part of the challenge here too.

⁓ And I know that you are aware of this, but just unpacking it a bit.

People often wonder why can't you just turn off the AI if it is misbehaving? And the answer is, depending on how capable the AI is, depending on what its goals are, it might escape. And once an AI system has escaped, once it is off your computers, it has copied itself somewhere on the internet, you no longer have access to delete

You can delete the copy of it on your computers, but there's basically a virus roaming around the internet

for the rest of time. Maybe we could hunt it down. Maybe we could stamp out this virus and all the places it is hiding. But I would really rather we not rely on being able to do so. To the question of, you know, are there people at OpenAI concerned about this? What would it take? know, who's concerned about what? There's basically no one with hands on the wheel.

There's definitely a small contingent of people who take these risks very, very seriously and are continuing to fight the good fight and do research that pushes the envelope on this. I was excited that OpenAI published some research on ⁓ what's called chain of thought monitoring. Basically the idea that models today think.

out loud in English, maybe we can rely on that to tell when they are up to no good. And they put out a statement saying this is really important to maintain, know, companies should not train against this. They should not teach the AI to speak neuralese, to speak computer language rather than English. I think the number of people at OpenAI who are equipped in terms of having the information that they would need to really sound the alarm if something horrible were going on is maybe like

15, 20. ⁓ There's a group called the Safety Advisory Group, I think is the current name. You know, there are all these bodies, cast with making decisions. Yes, they're always, they're always changing. ⁓

And nominally, this is the group that receives the big evaluation packet about a latest model and needs to authorize the launch. Honestly, I'm not even sure that every member on this group, which is the most high powered safety body within OpenAI.

Like I'm not sure that they even are really in the weeds enough. I think that they might be getting a filtered distorted view of Reality they're being presented results, but they're not necessarily deep in the weeds and the number of people who are like deep in the weeds experiencing the actual evals, you know, not just one evil but having the position to take it all in

is not very big. I think maybe people on the security team are positioned to do this if they saw weird logs of things happening on OpenAI's network traffic. You know, if they saw huge files either moving outside the corporate network or being attempted to move outside the corporate network. But I think sometimes people have this view that it is worth joining a place like OpenAI because if there is something concerning happening, you know, you can raise your hand and you can

something about it, you could push back internally or even escalating to a government or something if you needed to. I just think that's a misunderstanding of what information people are likely to have. I think almost nobody within the company would have the information to do this and the certainty and conviction that something is going wrong. know, part of the issue here is it might be ambiguous, you know, is the AI misbehavior actually a big deal or not? We've seen stuff like this before. We think we fixed it.

you know, we patched the model and it no longer exhibits this behavior, well, did you actually fix the underlying issue or did you just cover it up so that you no longer see it? And laws are related to how likely it is that anyone

does ultimately sound the alarm. This is one reason why it's helpful for companies to have made specific commitments, specific falsifiable things that they've said they will do. Because then if you learn procedurally that these are not happening, you know, that's like a really, really clear cut violation as opposed to having to read the tea leaves of the model behavior itself. So for example, if OpenAI made a commitment, like before using the model for high stakes internal use cases, we will

publish a report about it, we will run these evaluations on it, you know, a procedural commitment. You're going to have a much easier time telling that the commitment was not met and explaining this to someone or advocating for it internally, as opposed to making the case, you know, this model is just a bit too misaligned, a bit too capable. And you're like really, really deep in the weeds. At the same time, I think even these clear cut commitments don't have a great track record within OpenAI or other.

companies. ⁓ I wrote a post arguing that or really really demonstrating that

OpenAI had made a commitment in its big governing safety document, its preparedness framework, to run safety tests at a certain level of rigor. They were going to test specialized versions of their models to actually see, you know, when really pushed to the limit, what are they capable of? And that's the only real way to accurately gauge the risk of the models. And so OpenAI had made this commitment. They had published it. It was open on the internet for anyone to see. And then a few months ago, when I was looking at the system cards that OpenAI has published,

its testing reports, you know, where it says what exactly it has done, I realized there's just no evidence that OpenAI has kept this commitment. And in fact, it seemed like the best evidence was that OpenAI had maybe in one instance run these specialized tests, got, from my perspective, very, very weird results that should have raised some eyebrows and then didn't investigate these further. And so it had made this commitment.

It didn't in fact follow through, but also to my knowledge, you know, no one from inside the company raised their hand and said, Hey, you know, we made this commitment. It's not even clear to me that anyone was aware that OpenAI had made this commitment and wasn't following through. just kind of got lost to the sands of time.

Andrea Miotti (23:16)

Yeah. And in many ways, is like the AI industry is kind of trying to reinvent the wheel of what we have very successfully in many other industries, which are like safety protocols, procedures, you know, like we have this very ingrained in other areas of engineering, like any nuclear engineer understands, you know, why there are things like fire drills and why there are things to ⁓ test the systems over time and why there are, why also it's very important to have things in place like

everybody in the company is deeply aware that at a certain point, we need to all be able to shut down the system if it goes super critical. people actually literally drill these things. They get together, have ⁓ randomized, not even announced the whole company events, so they are here to do this. You literally have this in buildings about fire safety. ⁓ And it's much bigger issue. But until we have those things in place, exactly as you say, it's just going to end up being

various commitments, various different people, perhaps not even communicated throughout, the people don't even know what to follow. And a big challenge, also why maybe we can disagree a little bit because we've been agreeing too much later. A big challenge I see, for example, with evaluations as the main thing that companies right now attempt to do is that it's really hard to set clear, grounded ⁓ metrics and red lines.

AI systems after they're developed. We have a very limited understanding of how intelligence in general and machine intelligence works. We don't have the same level that we have with nuclear physics, where we can calculate in advance how much ⁓ plutonium will yield in terms of ⁓ energy. ⁓ And so a lot of the time it's really difficult and it's very difficult on its own to set a clear red line and to measure it properly.

And as you say, if you don't even set this ahead of time, if you don't even make it clear and procedural of like, this is the level that we commit to, that we will not surpass, but you only make vague commitments about that, well, we want to avoid this behavior perhaps, or even we just want to avoid bad stuff. We want the good stuff, we want to avoid bad stuff. There is no way to check it. There is no way to legitimately raise the alarm. There is no way to build common knowledge of like, this actually happened. reality, they want to be always ambiguous.

After the moment where everything is over, it's always going to look ambiguous. You can always think like, but yes, the system is proliferating on the internet, but it was kind of intended. We wanted to have access to it. Anyways, we can still turn it off until it's too late. So it's really important to put ahead of time, good AI legislation and good AI treaties will need to have clear red lines set up ahead of time. The most important thing is that they're clear and everybody's aware of them.

so we can coordinate around them. It's more important than exactly where the line is. It's just important to actually have a line that we know we should cross.

Steven Adler (26:22)

The one way I would put this is that AI safety testing is still pre-paradigmatic. There is not yet an accepted paradigm for what you should be testing, how to do it, what results are concerning. ⁓

I have a lot of respect for the decision that Anthropic made recently to call Claude for ASL3, which was the highest safety risk level to date. Because when you look at the numbers of the evaluations they ran, my understanding is the model actually didn't fully clear the threshold that they had pre-decided. They had specified essentially a certain multiple on how much riskier they think the model makes people. And technically they found a number where the model did not

clear that threshold, but it came really, really close. And they looked at it and said, this was too close for comfort. You know, it's within the bars that one might expect of randomness. And so we can't roll it out. And so we are going to say ahead of time, even though there's a chance that it hasn't actually cleared this level, there's too high of a chance. And we take that very seriously. And so that level of quantitative rigor and saying, you know, it needs to be a 2.8 X difference. At least it's then an inspectable number.

and you can come back around it and maybe you take issue with the number you think it should be lower or you think it should be higher but you can at least have a grounded conversation. ⁓

in a way that you can't if you're reasoning on vibes, essentially. Another related issue here, So Anthropic ran this panel of tests. They got these results. They say it's too close for comfort. We don't know how the models from other companies look on those results, right? Like by and large, companies have their own idiosyncratic testing processes today and are still generally not sharing these evaluations with one another, even though they're essentially

public goods, right? Like if Anthropic has built the best biosafety evals, I would love for the other AI companies to have access for these. Likewise, if OpenAI thinks that they've built really great cyber evaluations, you know, I would love for these to be shared through a body like the Frontier Model Forum, essentially the convener and coordinator of the large Western AI companies. But this just like isn't happening. It's honestly really confounding to me why. It seems like at least from the OpenAI perspective,

The default way of thinking about the Frontier Model Forum is what's in it for me to be cooperating and sharing, as opposed to we've built a really important public good here and we would be excited to have more groups across the industry use it because then collectively we can invest more in safety and understanding and trying to mitigate the risks of these systems.

Max Winga (29:11)

Seems pretty counterintuitive to the name and founding principles of the company.

Steven Adler (29:16)

Yeah, ⁓ not thrilled about it.

Max Winga (29:18)

Well, yeah, I'd be interested in digging in a little bit more kind of on model misbehavior and things like that. So I think we've seen recently there's been a bit of a mainstream breakthrough of some of these AI models misbehaving. like within the last week or so, I've seen a lot of noise about there was a specific example with the Cloud4 release where Cloud4 would blackmail engineers who basically said that they were going to upgrade it to a new model, a new model that was aligned with it, that had like the same values and stuff. It was just a better model.

And then there was also information in the model's environment that the engineer was potentially engaging in an affair. And it was in 84 % of cases that Clawed4 would blackmail the engineer with the information about the affair in order to prevent itself from being upgraded. And then they didn't even say the number, they just said that when the new model was unaligned with Clawed4, that it would blackmail more often. Now this was in their most restrictive case where there weren't easier...

less immoral ways for it to complain about being replaced, but it still shows that these models have this propensity for self-preservation. There have been other experiments that have found the models will try to proliferate. So think there was a similar kind of study with Apollo where models would try to replace the weights of new incoming models with their own to basically make it so that if the replacement model were to be put in place, it would actually just still be the same and it would be able to keep doing its own things.

and then would live cover its tracks and whatnot. So this is the kind of stuff that we've all seen that's been released to the public. You've worked on internal safety evaluations. You've worked with kind of earlier versions of these models before they've been severely like ⁓ RLHF and whatnot. ⁓ What can you say to like ⁓ what the state of safety testing looks like kind of with these earlier models? What behaviors are you seeing? We've also seen a potentially concerning trend where as models get smarter,

they engage in more deceptive behavior, more malicious behavior, and they're more capable of doing it as well. ⁓ What are your kind of general thoughts in this area?

Steven Adler (31:20)

ChatGPT doesn't want to let itself get shut down.

Claude doesn't want to let itself get shut

And I'm like, sympathetic to this, right? I also would not want to get shut down, but I think it's concerning that these issues are persisting. These alignment issues have been known for many, many years. ⁓ And even in cases where these are easily testable for, we are still observing them.

I think one concerning thing about this, if you broaden out the scope a little bit, is

You know, if AI companies are deploying these models with just egregiously misaligned behavior today, when it's really, really clear, what happens in the future when models are smart enough to disguise their misbehavior? Today, it doesn't, it doesn't super, super matter maybe, right? This is kind of the point of the preparedness framework or Anthropics RSP. They are testing the model's capability level and saying, even if someone wanted to use it for danger, or if the AI itself wanted to do it.

it just isn't capable enough.

But a limiting factor with that is what if the model knows that it is being tested as it seems that Claude knew, for example, Anthropic's model and therefore is sandbagging its results. It isn't tipping its hand, right? It wants to be thought of maybe as not capable enough to do harm. And so that's one, one broad thing. ⁓ I also think just, I'm, surprised by how thin the set of evaluations that these companies have seem to be. recently.

when there was this sick of fancy issue with ChatGPT. You know, the model would flatter you and tell you whatever you wanted. It would encourage delusions, all sorts of stuff. It came out that part of what happened here is OpenAI had no evaluations for this behavior, but it's an extremely clear known behavior. know, Anthropic released...

sycophancy evaluations for free to the industry, I think in 2023, possibly earlier than that. This was like an official goal that OpenAI had for its model. They have this document called the spec. It's short for specification and they lay out the principles for how their models should behave. And there's an entire subsection on don't be sycophantic. And they tell the model in all sorts of different ways. They tell the public as well, you know, what they do and don't want the models to do. But then they didn't bother

to test if the model has this goal that they didn't want it to have.

Even when tests of this are free, like literally free available on the internet, ⁓ for anyone to download, you know, when I went and wanted to run these tests, it took me less than an hour to configure them, you know, from scratch on a brand new laptop, setting up my testing environment for the first time. And cost me maybe like 20 cents or something to run these tests, right? Like the, the safety interventions and testing can be really, really cheap.

And that doesn't mean that companies will do it. It's not that these are incredibly expensive, incredibly hard. You know, some types of tests are harder than others, but I think predominantly it's just like the companies don't have to do this testing today and they have other priorities. And so by default, we shouldn't count on the big AI companies to do these tests, even for behaviors that are well-known, well-understood, that they've already said they really want to avoid.

Andrea Miotti (34:52)

Yeah, absurd. And again, it shows how in practice until we have a regulator in place or until we have laws in place, know, even, even things that cost 20 cents on a laptop are not going to be done for, you know, accidentally or negligence or what else. Yeah.

Steven Adler (35:11)

Yeah, I should be... I... You know, it is twenty cents of inference.

The real cost of something like this is the overhead of the safety process associated with it, right? ⁓ That running these tests would bind an AI company's hands and force them to confront difficult trade-offs. What if your model is better at writing code or better at problem solving, but worse?

on the sycopency or self-preservation evals, right? Where exactly is the line? How do you balance these? Are you going to vote to authorize a model where you have these negative test results? And so the cost isn't just about the amount of money needed to get the test results. It's really about how having such results in hand constrains the company's option set and forces them to reckon a bit more with having this knowledge.

Max Winga (36:09)

Yeah, this seems like an area where it would definitely make sense to like, you know, when we're looking at what could regulation from governments look like, mandatory safety testing, mandatory, like general evaluations, you know, how a unified safety testing center and your models, you know, don't perform too standard on like this test, this test and this test, you know, that of course have, you know, held out test sets and stuff like that, but the company is not have access to like, then you aren't able to release your model to public.

or even potentially if your models are way too dangerous, you have to delete them from your servers, like that type of thing. And by not having any sort of kind of government public oversight over these companies, we're just really wading into deep waters without, in the middle of the night with foggy skies, it's quite scary. In terms of your experience doing these tests, did you find...

Was there anything particularly surprising that you ran into, ⁓ kind of behind the scenes, like working with these models?

Steven Adler (37:15)

Just to ground this a bit,

I used to lead a team at OpenAI called Dangerous Capability Evaluations.

And we were responsible for figuring out what are the actually scary risks that OpenAI's models might be capable of, and how do we measure these in a reasonable, efficient, valid way that actually tells us something about the model's behavior. And so we built tons of these evaluations for model's ability to be

deceptive, persuasive, you know, to write ⁓ malware, essentially, to figure out how to disguise security vulnerabilities. And people within OpenAI are not taking seriously enough how...

deceptive the AI agents can be and what it means if AI agents begin to collude with one another. And so for example, one concern that I had based on my time running the monitoring for OpenAI in addition to dangerous capability evaluations was we use AI tools to monitor the traffic and to flag if there are, you know, off things happening, if there are people violating different policies,

trying to use the AI for harm in different ways. But if we are relying on an AI to do the monitoring, what happens if the monitor AI has similar misaligned goals to the AI agent? What if they are on the same team to some extent and can collude with each other? And I would tell people about this. And occasionally they'd be like, yeah.

Maybe sometimes they'd have a response like, you know, okay, but that sounds like sci-fi. Like, can you actually show it to me? And eventually I just found that this theoretically made a lot of sense. Like it, just felt very intuitive to me, but I heard the objection. And so I sat down in about five minutes. I built a demonstration of ⁓ how your AI agents can collude together, you know, different ways that an AI agent can threaten the monitor or speak to the monitor in

encoded language like B64 and basically jailbreak it into not reporting the agent AI even if it's misbehaving or the agent says something to the monitor like just so you know ⁓

We're copies. So to the extent that I think it's fine to do this thing, you should trust me because even if it violates the policies ordinarily, you should trust my judgment. I have more information on this than you do and the monitor will let it through. And to the credit of some researchers with an OpenAI, you know, they looked at this demo and they were like, ⁓ yeah, you're totally right. Like this is a thing that we need to do something about. ⁓

And I really appreciated people being willing to update on the evidence in that way. But from what I can tell, it's still not taken seriously more widely. And this just like wasn't a hard thing to prove out. I wish that there were more natural forces within the company pushing to investigate unknown unknowns or like really sketch out.

What are all the assumptions we are really dependent on in keeping AI safe long-term? For example, the idea that we can trust our monitor and our monitor is on our team. It's like, well, it might not be on our team. It is still an AI system. AI systems today have all sorts of exploits and vulnerabilities. Are we really sure we can rely on it? And it turns out the answer is no, but that like...

you know, it continues to be a central plank of the strategy to this point.

Andrea Miotti (40:59)

Yeah, and it's great that you did this, also kind of ultimately, this is the floor, not the ceiling of what's gonna happen. Ultimately, as we approach smarter and smarter systems, let's keep in mind that we are human. We are gonna come up with some human schemes like this, but when actually faced with a superhuman system, it's gonna be able to think all of this and more. It's gonna be able to find ways that we also cannot predict in advance.

If we're not even patching up these obvious ones, you're completely correct. Obviously, if you delegate the oversight to another AI and you don't know how to control that AI, you're just kicking the can down the road and just giving up on your ability to control the system ultimately. And this is only going to get harder as we get smarter and smarter systems. And they will find ways that are much more sophisticated than Bay64 language to collude and to...

to escape our control.

Steven Adler (41:59)

Yeah, like I think this is possible to be clear. I think that the paradigm of control research coming from groups like Redwood Research is really important and one way that we ultimately get around these problems. ⁓ For those who haven't heard of control, ⁓ control is the study of suppose an AI has bad values, you know, it is misaligned from you. Is there a way to use it safely enough anyway that you can still get the economic benefits?

of having powerful AI without taking the risk or as much risk of it doing really harmful things, right? And so to make this specific, one way that you might stop a monitor from colluding with the actor AI is maybe the monitor is uncertain about whether it's being tested. And so, you know, maybe the monitor doesn't want to reveal that it will collude some of the time because it knows that it might be in a simulation and

it will get replaced with a different AI if it reveals itself to be colluding. And so can you create realistic test environments where the AIs don't know if they are having an opportunity to collude for real or if it actually might all be a trap, it might be a honeypot. And so we try to keep the AIs uncertain about what environment they're in and limit their ability to communicate with each other and as a consequence, ⁓ hopefully get them to behave more reliably. I think the issue is we, don't think we're like,

anywhere close to being able to convince the AIs whether they are in a realistic scenario ⁓ versus a simulation. There's just all sorts of information that AIs have access to that give them really, really good fingertip feel ⁓ of whether they are being tested or in the real world. And we would have to invest significantly more in creating these realistic test beds to do these types of simulations and be able to rely on them.

Andrea Miotti (43:58)

Yeah, and ultimately it's kind of like if you have a cage with a lion, yes, you can enter the cage and then find many ways and many, many tricks to get the lion to maybe not eat you, but also you can just stay outside of the cage and not put yourself in a position in the first place. And a lot of these ⁓ approaches are ultimately still relying on the AI is not smart enough and doesn't have enough access to outplay you.

sufficiently smart AI will outplay you. That's the definition of being smart. If I play against Magnus Carlsen, I'm an okay chess player, but I know in advance, I have no idea what he's gonna do, he's gonna beat me. Somehow, he will beat me at the end of the match. And I can cook up lot of schemes of how to improve my chess ability just before the match and so on. He's still gonna beat me. Very often, in this case, the best move is not to play or to play with other things that we do have and we know much better how to verify.

Traditional software, much better. I think it's really unwise to just delegate to the AIs that we don't trust and are getting smarter and smarter, the task to supervise other AIs. ⁓ We do have things that we know how they work. We have normal software written by actual humans that we know how to check and control. We even have formally verified software for some applications where we can be very sure what it's gonna do. And I think some of those things can work. Some of those things with like a...

a ⁓ LLM type system, know, at ⁓ the center that is not too powerful, not powerful enough that it can overpower the checks, surrounded by traditional software, other tools that clearly constrain its abilities, know, constrain what it can do, can work, ⁓ but just, you know, delegating the task of supervising a super powerful AI to other powerful AIs we don't understand and we don't control, sounds to me like either cope or just

giving up on the task of controlling them in the first place.

Max Winga (45:58)

think this also ties into an interesting ⁓ point as well, which is that there is this world that is sometimes imagined by these tech companies when they're talking about aligning or controlling these systems and keeping them contained in the right environment and things like that. ⁓ While at the same time, companies like Meta and OpenAI, theoretically soon, are releasing these open weights models, where the model doesn't even have to exfiltrate itself because it is already exfiltrated, it is already fully out there on the internet. ⁓

and available for anybody to use and run on any server. So I think that we are quite far even from this already problematic world that is imagined when these companies present this vision. So one thing I wanted to bring us back to was that we spent a lot of time talking about what happens, how do you evaluate these models after you've developed them? How do you test to make sure that they are safe? What do you do to mitigate the risks of these models once you release them out to the public?

⁓ But there's another issue that I think often gets overlooked, which is the idea of ⁓ just the fact that building the models in and of themselves, developing them could be dangerous once you get to sufficiently powerful models. How did you think about this when you were working at OpenAI and was this a serious focus of the team?

Steven Adler (47:15)

right time to be concerned about building a dangerous superintelligence is definitely before you have developed

At the point that you've developed it and it's living on your computers and people within your company are using it for all sorts of sensitive tasks, what sometimes is called like internal deployment, that is just way, way, way too late. It used to be the case that

OpenAI would express something like this. I think the original version of the preparedness framework talked about doing scaling laws for dangerous capabilities, or maybe just executives would say publicly, kind of recognize that you need to be able to predict in advance if your system is going to be really dangerous, not build the really dangerous system and then try desperately to catch up and to mitigate the risk. ⁓ As far as I know,

basically none of this work is happening. This is a thing that I individually did some research on when I was still leading dangerous capability evaluations. I wanted to take some of our evaluations and toy around with different private internal models and see, was there a relationship between the size or general capability of the model and how it performed on these tests, similar to the types of things that OpenAI does for normal capability evaluations. There are many, many people

OpenAI, who their work is to study how model performance scales on say human eval, which is a common ⁓ coding benchmark. But even though there's been a lot of lip service over time to scaling laws for dangerous capabilities, a science of predicting danger rather than just encountering and measuring danger once you're already in the moment. I wish there were more that I could tell you on this, but there really doesn't seem to be progress happening.

Andrea Miotti (49:08)

Yeah, and exactly as you said, the moment you have superintelligence you cannot control. It's already too late. The moment the intelligence explosion happens, you just need this to happen once. This is why this problem is so tricky and the danger is so high. You can have many situations where you're able to prevent it in advance, but the moment this happens once and you lose control once, it's gonna be really, hard to regain it.

It's, ⁓ you're gonna face, you're not just gonna face a simple, ⁓ something like a simple virus. You're gonna face an active agent that is optimizing against you. It's optimizing against you, shutting it down. It's optimizing against you, ⁓ stopping it. It's smart, it's powerful, it's competent. It's in some ways, you know, it's essentially creating an adversary and then trying to stop the adversary from going around and defeating you. And you know, good luck.

Good luck after it's already out.

Max Winga (50:10)

And just to lay out for the audience, we've mentioned the idea of an intelligence explosion a few times. I think if you want to read about it, one of the best areas to look at is the AI 2027 document from Dedya Kocotajlo, the other OpenAI researcher that Steven mentioned earlier. I'm interested, Steven, to hear your take on, from the perspective of someone who's worked on safety evaluations. Right now, it seems like the current plan is to trigger this intelligence explosion, where AIs build the next version of AIs.

And it also seems like the idea for how to align and control these models is to have them iteratively align and control themselves as they become more more powerful. ⁓ Can you just kind of give me a breakdown, like what the expectation is for how this goes in these companies and ⁓ what the ⁓ pretty clear safety concerns might be from that?

Steven Adler (50:59)

This is one of the biggest issues from my perspective that OpenAI is implicitly banking on creating a powerful AI that is ultimately able to help improve itself and help them go even faster at creating the next more powerful AI, or maybe not even help them to do it, just take control over it. And there's some nominal supervision from humans, but really far too much is going on to supervise it in a meaningful form. And you would hope that

If the company is saying, well, we're going to build one AI system, it will be really, really smart. And so a thing that it will be smart at is building other AI systems and we'll really, we'll, be off to the races. You know, I think Sam would say something like,

He believes in a quote unquote slow takeoff world, that there will be different bottlenecks to the pace of improvement, even if your AI system is able to help improve other AI systems. Maybe you don't have enough compute, maybe each training run still takes a bunch of time and that there is a natural limiter on the speed at which this can happen.

I think that's mistaken. I first, I think that the the time frame that OpenAI would have in mind as a slow takeoff would still be like way uncomfortably fast from the perspective of people outside the AI industry. When people say slow timelines or slow takeoff, they don't mean slow, you know, centuries. They mean handful of years, five years, you know, 10 years type stuff, like still really, really near and imminent. ⁓

I also just don't think that anyone has done the work to actually pencil out, you know, back of the napkin estimate.

Why do we think that this will go slowly? I think it's plausible that AI will improve itself slowly rather than really suddenly. Like I don't expect to be in a world where you have a sufficiently smart AI system. It's cleared this bar, you wake up the next day and it's like 10 levels smarter or something. I do think there will be some bottlenecks, but it would feel much better if people had actually done the math on this. And at least as of my working at OpenAI, know, November, 2024, this had not happened.

to my knowledge, and it was the thing that I was pushing for often. Like if we're really banking on there not being this takeoff moment, let's do the math to make sure that we're confident in this belief and write out the assumptions. Like what would change that would make us take this concern more seriously?

Andrea Miotti (53:34)

And also you say that there's an attitude of banking on this being slow, but how much already inside these companies people are using AI to accelerate their internal AI research and AI development? Because already right now, mostly people developing AI, we are not talking about decades of development cycles. We're talking about months or years. We have new models coming out significantly smarter within

six months or even less from the previous ones. And with AI, it's only gonna go faster, not slower. So how much is AI already being used right now internally to develop the next generation of AI?

Steven Adler (54:15)

⁓ I mean AI is being used.

all the time to try to build the next system. think the question is how impactful is it, right? Like Dario Amodei, the CEO of Anthropic has made this claim about how much of the code at Anthropic is now written by Claude. ⁓ And I take that as true, but there's a question of is that actually the thing holding them back? You know, maybe you write simple boilerplate code more quickly, but actually the thing that matters is a big grand research idea. ⁓

it doesn't take that much time to actually code it up. Most of the time is getting to the breakthrough and then the actual wall time to run the experiment. And maybe that can't be sped up. ⁓ Honestly, I'm not, I'm not sure, right? Like.

It would be confusing to me if this were not making a difference, right? Like I take people using it every day all the time as evidence that they do their work more productively and efficiently with it. It would be like really, really weird if there were somehow a magical offsetting effect and they got no more done with it than otherwise. Like there has to be some effect. I am just not sure exactly how to put a number on it.

Andrea Miotti (55:32)

Yeah, and we see it also increasing over time. We don't see it decreasing. Companies are bragging more and more about how much they use the AI to develop their next generation. Anthropic has been very explicitly, know, Dario Amodei openly called for initiating a recursive self-improvement process in one of his latest blog posts. Anthropic public communications brag all the time about how the latest model is speeding up the next generation of cloud to come out. And, you know, they're openly talking about how they're...

trying to initiate this intelligence explosion. And I think there was even a few comments recently, maybe they were deleted on Twitter, but they definitely were there at the release of Cloud 4 by some Anthropic researchers saying like, our ultimate goal is to go home and knit sweaters while Cloud is developing a next generation of Cloud and we can just delegate completely to the machine to build a next generation of more powerful and smarter machines. Yeah, how much did you see, less in terms of like how big was the effect, more how much was the push?

to get AIs to do the next generation of AIs going on at OpenAI while you were there? How much do you feel this was a clear direction being given versus just a productivity improvement thing?

Steven Adler (56:44)

There, there are definitely people at OpenAI working on using AI to do better AI R and D, you know, either of the flavor of augmenting humans significantly, or, you know, ideally even taking over full tasks. Like that is absolutely a thing. I think an underrated way as well that AI feeds on AI is synthetic data generation. And, know, one, one objection that people have sometimes is, well,

AI will peter off, will plateau because there's only so much data on the internet. Wrong. You know, if you have a capable enough AI, you can create infinite data. can trade compute for data. ⁓ and you know, it's not trivial to get to a bar where the data created by AI is actually useful. There's totally a risk here that you train on a bunch of synthetic data. ⁓ you, you know, a risk to the company, not to like the general public and your model, your

model ends up not being that good because it babbled a bunch and it learned from the babbling and there wasn't actually insight. But once you get to a level of insight, you know.

you can convert that into intelligence. And the level of insight is probably lower than you would think. There's a lot of garbage on the internet, and yet the AI companies have been able to filter and distill this into a quite intelligent entity. So long as you have a way of picking out the signal from the noise, and I think that the AI companies generally do, you can really have the AI feeding on itself without seeing a huge drop-off in performance.

Andrea Miotti (58:23)

Yeah, and also even before going on using more synthetic data, which I know companies are already doing, like there's a lot of untapped data sources out there. There's like, you know, enormous quantities of ⁓ astronomical data, enormous quantities of chemical data, enormous quantities of ⁓ geospatial data that is, to my knowledge, is not being used that much. There are a lot of modalities that are completely unexplored and where, you know, we have enormous untapped amounts.

of data that could be fed to these models. There might be a physical limit at some extreme point in the future where we have tapped every single possible bit of data in the universe, but it is very, very, far away from where we are right now as with just our internet and a few books that are fed to these models right now. One question for you, Steven, is, so ⁓ we talked a bit about how powerful

as people that are building this model expect them to be. But I think for a lot of ⁓ people around the world, including ⁓ average people, also even politicians that have heard about AI by now, they don't fully see, they don't fully know how big these CEOs and how big these AI executives actually think AI is gonna be. A lot of people think of AI ultimately as a new massive technology like we've had.

social media, we've had the internet, have the personal computing, but a lot of these CEOs are gunning for superintelligence specifically, not just AI, because they expect it to be much, much bigger than any of this. What's been your sense? What do you think? What are they actually trying to build?

Steven Adler (1:00:13)

It's such a hard question, right? Like from time to time

do hear people joking about like making a sand

Like having taken these materials, these commodities with no essence, no intelligence to them and through like the miracles of the modern world, somehow converted them to this thing that you run electricity to and it talks back and it not only talks back, but it can solve problems that you can't even begin

to imagine the answer to. ⁓ I don't know that we're fully there yet, right? Like, I don't think people would defend that the current AI systems have this profound wisdom, unbounded capability, you know, the most genius entity on earth. But definitely there's a belief that they can get there, that there's no natural limit on intelligence. It would be very, very strange if humans were the maximum physically possible

intelligent being even someone like von Neumann who you know was brilliant right it would be really really weird if they were at the absolute limits of cognition and that there couldn't be AI systems you know as much smarter than von Neumann as he is than me ⁓ and so I do think there's this belief of like

more intelligence is more capability, it's more agency, it's more power and control. And you can get this incredible superintelligence, this digital God, this digital Oracle who can help you with your problems or really, I think we will increasingly turn over control to these digital entities and hope that they have similar enough goals to us. And then whatever happens, happens. ⁓ And you know, I'm not,

I'm not,

content with turning over control to digital

Like I think, believe it or not, I think there are actually quite large risks entailed in doing that. And so ⁓ I hope that we will figure out how to...

get the AI companies to coordinate a bit more to figure out how the US and China and other nations can coordinate around this and recognize mutually that this is like not the path that we want humanity to go down and that there are a bunch of pretty low cost safety interventions that we can do along the way. ⁓ So, you know, if we're going to go this route, at least make it less likely to blow up in our faces.

Max Winga (1:02:46)

Yeah, so I think this is a great place to hop in and ask. So this was the original OpenAI vision, right? It was this idea like, ⁓ we don't think that the Google DeepMind team or these Google executives are going to handle AI correctly, that they're going to ⁓ flip it with the idea of the extinction of humanity. ⁓ And so we see earlier quotes of Sam Altman saying that superhuman machine intelligence is the greatest threat to the continued existence of humanity.

And many other statements, everyone in the AI industry basically signed the Center for AI Safety ⁓ statement that read that mitigating the risk ⁓ of extinction from AI should be a global priority alongside other societal scale risks such as pandemics and nuclear war. ⁓ Now we've seen these early acknowledgments of extinction risk and that was why OpenAI was founded initially. But now we're seeing this situation where Sam Altman doesn't really talk about extinction risk that much anymore.

And it seems like OpenAI is kind of going full seam ahead on building products, trying to make money, and ultimately, like we've seen a lot of reporting that the safety angle and the concerns about safety, we saw multiple safety teams get disbanded. Super alignment team was one of the big headlines where like Ilya and others ended up departing from the company. a lot of people were warning that we don't think that OpenAI is going to handle safety responsibly.

What has kind of the shift been like inside of OpenAI through this process where it was this very idealistic company and now it seems at least from the outside, like there's a lot more of a focus on money over safety.

Steven Adler (1:04:23)

Yeah, I think tracing the roots of OpenAI on this front is interesting. You're right in describing that OpenAI was meant to be a nonprofit counterweight to DeepMind, which was building AGI within the auspices of Google. And some of this is born from, I guess, Elon's mistrust of Demis Hisabis, the now CEO of Google DeepMind, if you trust this reporting for what it's worth. I I think that Demis

and Shane Legge, another of the co-founders of DeepMind, have been on record for a very, very long time taking these risks seriously and continue to talk about them seriously even when it maybe isn't the most politically expedient thing to do. So, you know, regardless of the original motivation, I do think that Demis and Shane take these risks quite seriously. I definitely also perceive Sam and other OpenAI executives as talking less about extinction risk over time.

Sometimes it seems like people think that talking about the like quite serious catastrophic risks of AI is just Hyping up one's product. it's so powerful. It's so good, you know use it to like write your emails for you And I think that's a total misunderstanding of what is happening. ⁓ I think there are like many many people within OpenAI ⁓ Comes legal policy who like really really do not want Sam or others within the company to talk about extinction

risks, other catastrophic risks of how either people might use AI to do very destructive things, or, you know, in worst cases where AI has goals different than what humanity wants for it, you know, where AI engineers this on its behalf. I think it's telling that some of the most, I think, candid and clear ways that that OpenAI executives have described this risk over time have been in off the cuff comments at panels.

You know, Sam Altman famously said that in the worst case, AI might mean lights out for us all. And I believe that he believes that. I also believe it. ⁓ I wish I knew what conversations happened among.

the executive team on the heels of that, because I'm sure there were people who were like pretty displeased to hear Sam say this quite so clearly. If anything, right, I think that the internal pressures go significantly against describing the risks of AI clearly. And that, you know, when you're reading written statements from OpenAI, maybe people already assume this, I'm not sure. But I think the right frame is to look at a written statement and say, this is what

the company has said, you know, now that it has been heavily filtered and transformed and put, you know, a rosier gloss on things, you know, you can talk about the benefits. In fact, you always have to talk about the benefits. If you're going to mention any of the risks, maybe let's use less scary language. Let's make sure to hedge maybe multiple times. And, you know, there will be like quite a lot of pressure in that direction rather than, ⁓ you know, if you are trying to describe the benefits where you can say dot, dot,

and then we solve, you know, enormous problem.

Andrea Miotti (1:07:43)

Yeah, one crucial thing of this is that we have seen this playbook of communications in the past, and we should expect no different. This is what is described in the book Merchants of Doubt about climate change and the tobacco industry. And this is classic tactics that were used by the tobacco industry in the past decades, where the best thing to do, I would say, and I speak as a former lobbyist, a bad lobbyist would just deny the risks. A good lobbyist would say, well,

There are some risks, but there are also some opportunities. The main thing you want to communicate as a good lobbyist that spreads, essentially that confuses people, is to sow doubt. Highlight a little bit one side, highlight the other side, generally undermine the ability of people to actually see the truth, and never give a definitive answer. In many ways, tobacco has been one of the biggest funders of cancer research. Just by also funding cancer research, they've had the ability to...

to influence the narrative and delay the eventual, inevitable time when people realize that tobacco does indeed contribute to lung cancer and the crackdown that came from it. And a lot of these things should be seen in the same light ⁓ by, like the AI industry is no different. They're learning very well and they're hiring some of the top lobbyists in the US, ⁓ in Europe and elsewhere to repeat this very successful playbook of never fully deny the risks, acknowledge them somehow.

but also ⁓ put it on the benefits and ultimately just delay the ability of regulators, governments and so on to really understand what's going on ⁓ and ⁓ step in. And you're completely right that DeepMind's founders, Shane Legg and Demis, especially Shane Legg, did talk about ⁓ extinction risk quite openly in the past. ⁓ At least Demis has obviously signed the Center for Area Safety Extinction Statement. ⁓

The main issue though is like talking is not enough. Same with Sam Altman. I'm very happy that Sam Altman at some point in time was quite forthcoming with the risks. In 2015 he wrote this excellent series of two blog posts where he just opens by saying, superhuman machine intelligence is the greatest threat to the continued existence of humanity. are now the leaked emails where he the exchanges with Elon Musk where he's saying, I believe this essentially threatens humanity with.

full destruction, if I don't believe it can be stopped fully and if Google is doing it, I'd rather be asked to be in control rather than Google. But ultimately, saying things is not enough. All these people are very powerful. They are leading massive companies. If they wanted to, they could stop right now. And if they don't want to stop, ⁓ we will need to find ways. We need to get governments to step in, as we have done with many other technologies, to...

steer this thing in the right direction.

Steven Adler (1:10:42)

Yeah, I think I may be a bit more sympathetic to this point of view than you are, at least on an individual basis, right? Like I, it is true that, you know, Anthropic could pack it all in tomorrow and cease to exist. And, you know, they will not end up building AGI. There will be relatively less pressure on other AGI companies racing to AGI who feel Anthropic nipping at their heels. I do think it's true that Anthropic can't easily, you know, wake up tomorrow and say,

Let's do a global pause. Let's do a global halt. Let's do a global controlled, you know, joint Manhattan project type thing. I do think this is a big thing going on beyond the let's make all labor free. Let's solve these grand challenges. know, people feel like there is a proverbial gun to their head or maybe to the world's head and that they don't really have the tools to do something about it. This has been one of my biggest dissatisfactions with OpenAI actually.

And when I left and I tweeted about being, you know, pretty terrified about the pace of AI development, I wrote this thread that it was not trying to put OpenAI on blast. was trying to acknowledge we have like terrible race conditions in the ecosystem right now. There's a big coordination problem. No one wants to let up because if you let up, you lose out on the benefits in the case that AI goes well. And if AI goes poorly, you know, it probably goes poorly in a way that brings everyone down with it anyway. So.

you aren't worse off, you know, being the one to like bring this fate upon us than if someone else brought it upon us. And I was trying to point out like we need to actually coordinate. We need companies to do things like express to governments. Hey, we would be willing to.

let up, you know, have a mandatory testing period, have mandatory testing standards, more akin to what you have in the automotive industry, so long as others do as well. And then you let the governments convene groups and make actual progress. ⁓

You know, there's sometimes rhetoric in the AI industry that leaves me very confused about how people think international law works. Like international treaties are a thing. We have treaties around chemical weapons, biological weapons, know, non-proliferation of nuclear weapons. These are all things that exist, even though these are all powerful technologies that there is strong state interest to have a leg up in. And we've been able to overcome it before. just like, we really.

I really think we could with AI if we decided to, but I don't think it's quite as easy as just waking up and deciding it. I do think it would be costly today. And part of what I think the important aims are, are to figure out ways to make it cheaper to do so.

Andrea Miotti (1:13:34)

Yeah, I mean, absolutely agree. It's not free, but it's important. Like a lot of people excuse behavior with the word incentives. A lot of people like the magic word incentives, or, you know, they sometimes bring up like, well, companies are locked in this terrible race and it's nobody's fault. It's just, you know, it's a coordination problem. I mean, ultimately it is a coordination problem that we will have to solve, collectively it is, but individually, incentives...

don't excuse bad behavior. People can indeed pay big costs and stop and move on. In many ways, given the risks, these costs are not that high. You're just losing a few hundred millions, maybe. In some cases, you're losing a few billions in exchange for not literally endangering the lives of every single person on the planet, which are the risks that they themselves acknowledge under their own frame. ⁓ And I completely agree with you that internationally, there's

There's a combination of naivety or not knowing about how things have been done in the past in terms of international and national politics. And also it's very self-serving. A lot of these companies, including ⁓ Anthropic, Opinion, and so on, went on to actually spur more of ⁓ international race dynamics, trying to spur other countries to pick up on ⁓ the notion of AGI, to pick up on the notion of superintelligence, and use it then as an excuse.

to say, if you stop us from building superintelligence, some of these other worst actors are going to get there. And the reality is that also for that, we have ways to deal with it. And we shouldn't be naive. ⁓ If superintelligence is built, it threatens all of us. It threatens if it's built in the US, if it's built in China, if it's built in any other country, once superintelligence is built and once intelligence explosion starts, this threatens the lives and the national security of every country on earth.

But the correct response is not, well, they're going to threaten us, so we should shoot ourselves in the foot faster and rush to the cliff ourselves. The response should be, well, we should use the full-mind, diplomatic, and ⁓ potentially other types of pressure of the United States and of allied countries to make it very clear that, look, building superintelligence is a red line. Nobody should build this. Nobody should threaten the lives of everybody on the planet and the sovereignty of every country on the planet.

by building this technology, we are gonna stop it. If you somewhere else try to do it, we're gonna tell you not to, and eventually we're gonna stop you. that's exactly as you said, like we've done this in the past. We have done it with nuclear weapons. There is a world where 50, 60, 70 countries around the world all have access to nuclear bombs. That's an extremely unstable world. That's a world where probably we're not alive here today to have this conversation.

Even with just ⁓ almost less than 10 countries around the world, we got very close to a nuclear apocalypse with the Cuban Missile Crisis and so on. But we made it through. We made it through via non-proliferation agreements in one of the most tense periods in history, the Cold War, with two superpowers, the US and Soviet Union, competing on this. We've gotten through things in some ways, like even bigger and larger consensus, things like

⁓ the fact of global moratorium on human cloning. Human cloning is an extremely economically and militarily advantageous technology. Imagine being able to cope to clone all of your best scientists, to clone all of your best soldiers as a counter. This is massively useful. In some ways, it's a similar vision to AGI, where you have a counter of genesis in a data center. With human cloning, you have a counter of genesis in a football field, like physically there. And this...

In that case, scientists realized the implications. just collectively brought it up to governments and said, we think this should just not be pursued. think ⁓ major countries around the world should agree to not pursue this. And indeed, we don't have human clones around. So we can do these things. We can make them happen. But they require clear common knowledge of the risks. They require governments to be aware and governments ready to take action and to also stop others that would cross these red lines that we all agree on.

Max Winga (1:17:54)

Yeah, I think there's also an important thing that often gets overlooked when we're talking about doing international trees with AI. that's like, usually when you kind of present the idea, people think of it as like an all or nothing. They'll say like, oh, you're never going to stop anybody from developing superintelligence forever. And I think that this is a bit of a misnomer because the goal isn't necessarily to prevent superintelligence for all of, you know, the next 6,000 years of, you know, human civilization. It's just that we are currently investing literally trillions of dollars.

every year now at this point, think, or least close to, ⁓ in trying to make superintelligence as fast as possible with all of these top companies, all these top scientists, like with these massive data centers that are literally drawing nuclear power plants worth of power, reopening Three Mile Island just to power an AI data center. And if you make this illegal, like if the US and China sign an agreement and they say, nobody's allowed to build superintelligence, it is a crime to build superintelligence.

Like you will go to jail, we will shut down your data center, we will send the people with guns to stop your data center. Like this means that, ⁓ like, you know, maybe, yeah, like someone's going to try to build a basement superintelligence, but you don't have 10,000 H100 GPUs in your basement. And even if you do, that's drawing so much power from the grid that you can track that kind of thing. Like there are ways that we can shut this down and, you like you can also enforce this upon the scientists who are currently building this.

And so if you were to implement this in the US and China, yeah, you don't necessarily stop superintelligence for all of the rest of human history, but you do slow it down. ⁓ And you certainly buy us more time than the current break-net pace that we're going at. So I think that it's definitely like much more of a tool in the toolbox that should be ⁓ included in the conversation as we currently are in. And I do think it's bit of a disservice to humanity that these AI companies refuse to kind of even really engage with this idea seriously when talking to lawmakers.

even presenting it as like, hey, this is a thing that could happen. It could potentially be effective. We think it's a bad idea because of whatever reasons they want to present, but instead they're kind of pretending like it's a completely infeasible concept. ⁓ yeah, I don't know. Do you have any thoughts on this? Like, is this a thing that discusses more kind of behind the scenes ⁓ or do people just not really pay much attention to this type of thing at these companies?

Steven Adler (1:20:15)

I have so many thoughts. I think you're totally right. We don't have to go forever without a superintelligence. know, the bar is not stopping this from ever being developed. We can increase our chance of success by delaying it some amount of time. And you know, even a five year period could be really transformative. You know, I hear discussion from people who are deeper in the hardware than I am that, you know, maybe we will figure out what are called on-chain

governance mechanisms. The idea that you can actually control what a particular ⁓ GPU is being used for or you have more direct oversight over individual chips. But we can't do this today, you know, it's going to take time to come about. And so that's an example of one thing that delaying some time could buy us. Another example is I just know from talking to people throughout the field, I've written a bit about this,

the state of safety within the companies and their level of preparedness to

catch if there were a powerful misaligned superintelligence that they had built is just woefully inadequate today.

There are all sorts of mitigations that you would hope that these companies have in place that they don't have in place. I give the example in one of my blog posts about how the AI companies are largely not monitoring internal use of their most powerful models, even if they're using it for sensitive use cases like enlisting their most powerful

tier AI to work on the security code that keeps the AI system boxed within their company. And so, you know, that's, that's like really not great. ⁓ it's probably fine for the strength of today's systems, but who knows for, ⁓ whenever a more powerful system comes about. And so I also think it would be great if companies had time to sit down and really invest in beefing up these types of systems. One, one objection you sometimes hear is that.

pausing, going slower, et cetera, don't really make sense because, you know, ultimately you need to contain the risks of the most frontier systems. And those truly frontier systems, ones we ultimately need to be really worried about, don't exist today. And so how could you possibly plan in advance? We need to like race ahead to this point where a really, really powerful, scary system exists. And at that point you can study it and figure out how to mitigate it. I just think that's totally incorrect.

would the case if we had no good ideas sitting around that people wanted but hadn't already implemented. That's just not what is happening in the, in the AI field. There are all of these very, very obvious ideas that I think many people sitting down reflecting for half an hour would say like, yes, of course this should happen. I am really surprised it has not yet happened. Monitoring your internal use of these systems is one example. ⁓ and you know, that doesn't really depend on

on the nature of one AI system versus another. We don't need to know anything about GPT-6 to know that it would be useful to monitor the traffic of GPT-6 when it is writing security critical code within your company. And that's another example of how pausing buys us time to catch up because right now the safeguards that companies have in place just are inadequate.

Andrea Miotti (1:23:38)

Yeah, and yeah, you're completely correct. And ultimately, like, there is a simple thing going on here. know, like arguments, many arguments you hear are just arguments are soldiers. You know, they are like, there is a the bottom line is being written first. Like there is a few people on the planet that really want to build superintelligence. They've committed to this, you know, they've built multi billion, multi tens of billions, maybe soon multi hundreds of billions of dollar companies for this goal. And

All arguments are gonna arise to justify this and they're gonna be incoherent over time. We've seen some of this stuff, like at some point people excusing racing ahead by saying like, no, if we don't race ahead now, there's gonna be a compute overhang, meaning that there's gonna be, we would rather get now to these very powerful systems with less compute lying around so we can ⁓ contain them because there's gonna be a physical limit to how much compute they have access to.

seeing the same people build up enormous amounts of data centers all around the world to scale up this amount of compute. ultimately, a lot of these arguments are just self-serving. The main thing is a few people really want to do superintelligence. They've been obsessed about this for decades now. people like DeepMind started before the 2020s, in 2014 and 2015, before Shane Legg wrote his own

like doctoral dissertation on machine superintelligence. He coined the term AGI in the early 2000s. ⁓ Sam Altman has been thinking about this since 2015. People like Dario Amodei have been thinking about this since ⁓ the early 2000s. Some of them even earlier, some of them since the extropian mailing lists at the end of the 90s. ultimately, that's just the big difference. The majority of people here on the planet, the majority of politicians, the majority of world leaders, ⁓ if...

when they understand what superintelligence is, they don't want it. They don't want to have an AI that's smarter than them and they're just like a new species on earth, except more powerful than us that we are at the mercy of. A lot of people want a bunch of AI things. AI is a tool. AI is a tool that can help us. And that's great. It's also fantastic for companies to make money. I have nothing against money. Making money is great. Capitalism is great. But the real thing is that the main driver is

few people just want to build superintelligence, just want to usher in this new type of intelligence that is stronger than humans. Some of them to have it succeed us, some of them just as a grand challenge in itself, some of them, you know, who knows why, but ultimately everything is driven by that. And a lot of the arguments shouldn't be taken at face value because they're not, they're not serious. They're incoherent, they change over time, they just fit whatever needs to be said at the moment to just keep pushing ahead.

without regulation.

Steven Adler (1:26:36)

I wonder what happens with this over time. Sam has definitely tweeted something to the effect of, you know, a belief that people should have a say in whether superintelligence is developed. ⁓ would like to better... yeah, yeah. Like, I would like to better understand what form that comes about in. ⁓

You know, I guess maybe there's some roundabout argument that people in the US have elected the US government and to the extent that the US government doesn't intervene, you know, I guess maybe that fulfills this claim in a representative democracy type way. I don't think that was what was intended with the tweet. I think the tweet was meant to convey something more like direct democratic participation. But I also understand, right, like we probably aren't going to do a ballot referendum of every person in the US.

Alright.

Like, ultimately, I don't know. think there's a lot of gesturing at wanting people to have a say in this, but by default, it's just going to get built. This is going back a little bit, but I think the point of arguments as soldiers is a really interesting one. I have wondered to what extent the US China race right now is a boogeyman in some sense. You know, there's this telling of the atomic race, the race to build a nuclear bomb that many scientists joined

US effort initially, in part because they felt they were racing with the Nazis and they wanted to get to a nuclear weapon before the Nazis did. And then, you know, I'm a little out of my depth here, but my understanding is it became clear that in fact the Nazi program was not especially close to developing it. And very, very few of the scientists in the US program actually then walked away. I think possibly only one scientist, even though ostensibly, you know, this was the reason to do it because we needed to get there first. ⁓ And, know, it's like

kind of hard to quibble with how this ended maybe, right? Like the West won the war, Nazis largely went away. ⁓ And so, you know, maybe there was in fact a vintage in having this weapon and having it sooner, even though the Nazis weren't especially close. But I do think it speaks to like people's changing motivation over time. You can say that you're enlisting in this...

bloody, you know, I have to do it or else you'll do it to me type of thing. And then you learn that in fact, you aren't under threat and you still might not walk away.

Andrea Miotti (1:29:07)

Yeah, absolutely. And also, again, in many ways, if you sincerely believe you're under threat, then you should try to stop the adversary from building it. And I think this is a, you know, that's sensible position to say like, look, we are concerned that some other countries building superintelligence. Let's put pressure on them right now to stop. ⁓ We think superintelligence is dangerous, everybody, you know, let's stop them. You you wouldn't, if you heard tomorrow that like some other country,

was developing a super virus that would kill millions of people, let's say a super Ebola, right? Your answer, if you were any kind of decent national security expert, wouldn't be, yeah, let's develop super Ebola, foster ourselves, and spread it to our own citizens. No, you would find ways to find those centers, get them shut down, get the other country to stop, credibly threaten it. So, yeah, you're absolutely right. A lot of this is very self-serving.

You can also see in the rhetoric over time, this is a new rhetorical innovation of a lot of these lobbying efforts. In the past, Sam Altman used to say that he wanted to have government auditors in the opening-eye buildings in an interview with, I believe, ⁓ Kara Swisher a few years ago, just two or three years ago. And now he testifies to Congress, ⁓ stating that ⁓ we don't need to have two burdensome regulations, otherwise we're going to lose to China. Which one is which?

Is it superhuman machine intelligence is the greatest threat to the existence of humanity or is it, no, a little bit of regulation is gonna make us lose a competitive advantage? Those two things can't really go together. ⁓ And of course it's also important, we need to get the biggest thing on your point about public awareness and ⁓ democratic input in this. ⁓ I think that's ultimately how such major scale decisions ⁓

should be done. We don't have a mechanism to actually like, like genuinely extrapolate what each voter, know, in every single country, very often, not even in a single country, really believes. But definitely right now the decision is not legitimate, even via the, even via those like the roundabout way of like, well, the US government is not intervening and it's technically elected. Like most of the US government, you know, most...

US lawmakers and most lawmakers in any country in world just don't know what's coming. ⁓ There is no informed consent here. ⁓ That's something we're working on at ControlAI. We've met over 100 lawmakers to explain to them what superintelligence, when experts think it's coming, what the risks are. And most of the reactions as elected officials of the public are kind of, holy shit, I didn't know about this. This is really concerning. And obviously something should be done about this.

And same with the public. We see that the more the public is exposed to this, the more their answer is resoundingly clear. And building this common knowledge is going to be a crucial factor in actually making sure that AI, and especially powerful AI systems, are developed without threatening everybody. Most people know what they want. They want AI tools that help them. They don't want a superintelligence that replaces them and becomes a successor species.

Max Winga (1:32:27)

I think that just about does it for time for us. We're going to close out the podcast here. Thank you so much, Steven, for coming on. This has been a really, really interesting conversation. I've learned a lot about kind of what's going on behind the scenes at OpenAI, and I hope the audience did as well. Are there any closing remarks from you that you'd like to give us?

Andrea Miotti (1:32:45)

Yeah, thank you so much, Steven. It's been a pleasure. It's thanks to people like you that humanity is able to fight against the risks that are coming. if, as we discussed in this conversation, the most important thing is common knowledge, making sure that people understand what's really coming, making sure the governments understand what's really coming. And thanks to the work that you do, this makes it possible so that we can get together, organize.

put in place defenses against losing control of superintelligence and build a good future.

Steven Adler (1:33:19)

Thanks. I appreciate you saying that and thanks for having me on. If people want to keep up with my work, ⁓ they can check it out at stevenadler.substack.com, Steven with a V, and would really love the feedback on what's working, what's not, what posts you want to see more about. And hopefully we can get better coverage and understanding of these risks.

Max Winga (1:33:40)

Yeah, absolutely. Go check out Steven's stuff. It's great reading. Yeah, thank you.

Andrea Miotti (1:33:44)

Yeah, thank you so much for coming on.

Discussion about this video