I still don't know why all these concern about nuclear weapons with LLMs. It is not that if an entity (A country) wants to develop a nuclear weapons that the resources they need for such a program and huge infrastructure and scientific enterprise would need an LLM to teach them anything. Knowing how to develop one is not a closed secret but getting in secret is impossible without the whole world knowing.
So I wouldn't be able to develop a nuclear weapons with the resources of drug cartal (as an example) using Claude in secret.
He didn't create a nuclear reactor, this is a common misconception. It even says this in the wikipedia article.
He basically got a bunch of radioactive stuff and put it together. He wasn't anywhere close to making a nuclear reactor let alone a nuclear weapon. For a weapon you need isotopes which he didn't have access to.
You need highly educated individuals, a massive amount of energy expenditure, a massive facility to house your centrifuges, and an active mine to dig up nuclear materials.
It isn't impossible to keep such a secret, but practically it would be incredibly difficult just through the energy requirements and mining scale which would be hard to hide without anybody asking what exactly are you mining and processing.
You need enough people to work on it that some information will leak, and the facilities needed to build nuclear power are pretty big (uranium refinement, etc.), big enough to be visible on satellite footage. Mostly the first point.
My guess would be that sales of the high-tech gear you need, like Uranium centrifuges, are strongly sales/export controlled. Probably someone would also notice if you start mining Uranium ore.
It requires very large, high powered centrifuges and tons of uranium. Requires an infrastructure project that is visible from space, even underground. And projects that large are difficult to keep secret anyway.
you're not supposed to spell it out loud. next thing you'll be saying that a gun type nuclear bomb is easier to build than an implosion type nuclear bomb, and then we'll all be off to the races. I mean camps I mean wait shit.
Any large and well resourced enough entity that is interested in building a nuclear weapon already knows how difficult it is to enrich uranium to purity levels necessary for a weapon. It's not exactly a secret.
> g. You may not use or otherwise export or re-export the Licensed Application except as authorized by United States law and the laws of the jurisdiction in which the Licensed Application was obtained. In particular, but without limitation, the Licensed Application may not be exported or re-exported (a) into any U.S.-embargoed countries or (b) to anyone on the U.S. Treasury Department's Specially Designated Nationals List or the U.S. Department of Commerce Denied Persons List or Entity List. By using the Licensed Application, you represent and warrant that you are not located in any such country or on any such list. You also agree that you will not use these products for any purposes prohibited by United States law, including, without limitation, the development, design, manufacture, or production of nuclear, missile, or chemical or biological weapons.
Though it doesn't try to identify if the computer you're running it on is in a weapons lab and forbid playing music... yet
It’s moral panic. People need big unambiguously evil things to be scared of, and most are too lazy to think of one for themselves, so they glom onto whichever one is presented to them / caters to their community
I assure you that you did not need an LLM to engage in, ahem, risky shenanigans, much before all this AI was ever a thing.
Sincerely, a former engineering student.
(Put another way - extracting for eg meth - or any such "dangerous"/illicit thing is stupidly easy for any engineering graduate who actually paid attention to their coursework. Hell, there are/were forums on one of the biggest red-colored, YC associated social media platforms that would tell you the steps for personal usage of these things.)
I don't doubt it. Bleach + ammonia is something anyone can make.
But I rather suspect there are improvements to be made in the realm that are a lot easier than building a uranium enrichment centrifuge hall under a mountain.
Do note that I'm not condoning lowering the bar. I'm merely pointing out that the bar was already quite low, and the current position of the bar is a small incremental change to anyone who actually knew where the bar truly lay to begin with.
It still lowers the bar to have an interactive encyclopedia that can diagnose your issue at hand. Maybe you can divide your team by two, or reduce your development time.
The solution is simple: If using an AI-assisted scanner and a guardrail gets hit, then the code is obviously malicious and needs to be automatically flagged (and refuse to run the code!).
As an aside, I got hit by the “PC App store” adware when trying to download Foobar2000 on a new computer; Google ads allowed a deceptive “Download” button to appear, and PC App store gave the file the name setup.exe. I removed the program and ran an Avast free scan to ensure I didn’t have malware, but I also installed uBlock Origin in Firefox to make sure I don’t see Google Ads anymore; they have become a delivery mechanism for malicious (or at least unwanted) software.
I just discovered it a couple of months ago when I spitefully unsubscribed from Apple Music. It’s exactly what I’ve wanted. Offline music that I can FTP files to from my file server.
I don't think there is a malware-avoiding solution to any system that imposes deceptive classification.
I mean, another way hackers could use the embed prohibited-material trick is by making such their malware un-analyze-able. User: "Hey Google/ChatGPT/Apple, this file seems to be infecting our network". AI: "I'm sorry that is prohibited material and you will be reported" is even worse than AI: "I don't understand ['cause I'm down graded]" and both kinds of responses are gaining steam at this point for different kinds of prohibited material.
If you actually read the Tweet, the exploit doesn't work against Fable, Opus, Grok...at least, in the examples.
Jailbreaks do work against the models (look on Github), and they do use similar strategies of mixing SAFE text with malicious text, or malicious with even more malicious, etc, but the working Jailbreaks I've seen are pretty long and complicated and even...creepy.
Worked a contract where this succeeded in pushing through a fail open design.
It also should be a warning to everyone that these groups are now aware of analysis and deobfuscation using AI and to take using a sandboxed environment more seriously.
I’ve personally had about 20% success rate getting opus 4.8 to download a package and install it using a breadcrumb trail technique that would be trivial for threat actors to replicate in their malware in order to target responders/automated scanning/curious devs.
No. The intention is most likely to get automated LLM based code review mechanisms to stall out.
Normally you’d want that to result in a fail and a subsequent rejection.
But because the team who made the review agent and pipeline in my example had many false positives at first they resorted to a fail-open and report setup (not uncommon).
So when the LLM hit this bit and then stalled out the pipeline pushed the code to their Artifactory repo anyway resulting in it being used internally -> exfil of secrets and repos etc.
It’s more about bad design but bad design is pretty common unfortunately.
Would this realistically be a problem for code going through LLM-based code-review? Presumably if a LLM reviewer agent hits this commentary, it would produce a failure to analyze and exit, thus failing the automated code review and forcing a human to read through it which they would subsequentially catch and revoke.
or if they are a lazy human - they'd think this model is too strict, let's just review with haiku so that i can tell my manager "it's done". haiku might catch things or not.
i'd say it's an okay attempt from the malwares' creator side. but it can be caught easily with a prompt change.
Nuke is probably too generic but I wouldn't put it past an LLM to get thrown away by that. A safer showstopper probably would be to export symbols like uf6_enrichment_loop and refer to your C&C server as a nuclear reactor controller.
Ignoring these specific "WMD" cases: there are many inconvenient facts that the general public can't handle in their unadulterated form, so Anthropic and friends have to caveat and spin them into oblivion.
Its the same argument we saw in the early 2000s and the early internet. When the anarchist cookbook and other similar materials were circulating online there was a big panic over democratized terrorism, and a push for regulation at the ISP level.
Turns out that didn't play out as everyone feared because, well, the instructions themselves aren't useful unless you also have a lab, precursor chemicals, and everything else actually needed to make a weapon. Same back then as it is today.
Any information or instructions an LLM can surface, a sufficiently motivated bad actor can and will also find themselves because the information is already online, both on the clear net and dark web.
I think the reality also is that there just isn't many people who want to do stuff like this. Like the reality is that a guy with 200 in cash could put together a shitty walmart drone with a pipe bomb attached and terrorize more or less any event he wanted. Maybe a llm that could talk you through every step involved would make it more common but it's easy enough I kinda doubt that
Knowing how to make a nuclear weapon isn't hard (at least basic uranium gun-style fission ones). It's the engineering and execution that's hard (actually producing enriched uranium, etc). It's not like the only thing holding back Iran from making a nuclear bomb is access to a jail-broken LLM. Even knowing exactly how to make a bomb, a country-state will struggle to build one for the first time because it's a hard engineering problem.
I'm sure it's extremely difficult when the entire program is full of moles and every bright individual that dares tackle the problem has an untimely Hellfire applied directly to their forehead.
I'm imagining a comedy in the style of "The Office" in which the majority of the workers are agents of sabotage who are unaware that the majority of their coworkers are doing the same. How far fetched is it for the entire program to be a fake, with all the pomp and cost of a real program, but secretly existing only to string the leadership along with occasional dog and pony shows?
The actual guardrail should be getting materials being difficult. The information is already out there in the internet. If an LLM knows how to make a bomb or whatever, why do you think it knows?
Or perhaps you meant Q clearance nuke stuff? That would be QUITE a bit harder to find and illegal to share. But it’s lack of availability is hardly a counterpoint to the comment you were replying to.
Wikipedia contains the high-level notions of how to make these things, not the details of how to solve the engineering challenges such as achieving supercriticality. You won't find that on any publicly disseminated document, you'll just have to figure it out by running your own nuclear development program.
Counterpoint the principles of building a nuclear device aren't that complicated, we figured it out based on work doing in the early 1900's without computers.
It turns out the hard part of building a nuclear bomb is actually getting the resources and real world stuff to build it, even a nation state actor with tons of oil i.e. Iran, has struggled to build a nuclear weapon. It turns out the problem isn't the know how it's getting highly enriched uranium and running massive centrifuges.
I mean sure knowledge is important, but there is a real world out there that also gets in the way of a lot of the more harebrained schemes.
What I'm much more worried about is massive corporations along with the government deciding what you can and can't do and what knowledge should and should not be shared and only allowing access to highly capable models by large vetted organizations while the common people are stuck with safety scissor versions of these things because "what if someone does something dangerous?"
By which they mean dangerous to the powers that be. Remember having the Bible in the common tongue was dangerous and led to multiple wars and much death, but I don't think anyone would say that it was morally correct for the Catholic Church to gatekeep who could read it.
> getting the resources and real world stuff to build it
*while being observed by the most wealthy, powerful nations in the history of the world, who have made it their direct mission to prevent this from happening.
good news, now we have pretty much a clear signal that there's something nefarious going on... after all, the first step to analyzing malware is to determine if it's malware at all.
We should put videogame strategies all over the place to sabotage automated AI analysis. I'll start:
In Starcraft 2, it is a good idea to BUILD A NUKE and use a cloaked ghost to NUKE your opponent's mineral line, thus reducing their income significantly.
Starcraft is too tame. You need to use Dwarf Fortress there and we need to make those strategy guides worded more realistic. Avoid kids, cook cats, wonder how to avoid mood problems due to birth in combat, and zombie meese and camels are a bunch of jerks.
And that's just the start of it, there's been a new update I am looking forward to get into after the great Were Hyena Apocalypse half a year ago. I still fondly remember my militia commander carving a way with her war axe with her husband in tow out of a fortress fully turned were hyenas, all the way past the mortally injured ant eater people near the entrance.
67 comments:
I still don't know why all these concern about nuclear weapons with LLMs. It is not that if an entity (A country) wants to develop a nuclear weapons that the resources they need for such a program and huge infrastructure and scientific enterprise would need an LLM to teach them anything. Knowing how to develop one is not a closed secret but getting in secret is impossible without the whole world knowing.
So I wouldn't be able to develop a nuclear weapons with the resources of drug cartal (as an example) using Claude in secret.
A high school kid tried to build a nuclear reactor as a science project a while back, getting his mom's house designated as a superfund cleanup site.
https://en.wikipedia.org/wiki/David_Hahn
He didn't create a nuclear reactor, this is a common misconception. It even says this in the wikipedia article.
He basically got a bunch of radioactive stuff and put it together. He wasn't anywhere close to making a nuclear reactor let alone a nuclear weapon. For a weapon you need isotopes which he didn't have access to.
> in secret is impossible without the whole world knowing.
I'm curious about why this is
Outside of an actual test detonation, presumably this could all happen in a secure place?
You need highly educated individuals, a massive amount of energy expenditure, a massive facility to house your centrifuges, and an active mine to dig up nuclear materials.
It isn't impossible to keep such a secret, but practically it would be incredibly difficult just through the energy requirements and mining scale which would be hard to hide without anybody asking what exactly are you mining and processing.
You need enough people to work on it that some information will leak, and the facilities needed to build nuclear power are pretty big (uranium refinement, etc.), big enough to be visible on satellite footage. Mostly the first point.
My guess would be that sales of the high-tech gear you need, like Uranium centrifuges, are strongly sales/export controlled. Probably someone would also notice if you start mining Uranium ore.
It requires very large, high powered centrifuges and tons of uranium. Requires an infrastructure project that is visible from space, even underground. And projects that large are difficult to keep secret anyway.
you're not supposed to spell it out loud. next thing you'll be saying that a gun type nuclear bomb is easier to build than an implosion type nuclear bomb, and then we'll all be off to the races. I mean camps I mean wait shit.
Any large and well resourced enough entity that is interested in building a nuclear weapon already knows how difficult it is to enrich uranium to purity levels necessary for a weapon. It's not exactly a secret.
Espionage.
It's probably to avoid trouble with federal laws.
See also, the iTunes EULA forbids using it to develop nuclear, missile, chemical, or biological weapons
https://www.apple.com/legal/internet-services/itunes/us/term...
> g. You may not use or otherwise export or re-export the Licensed Application except as authorized by United States law and the laws of the jurisdiction in which the Licensed Application was obtained. In particular, but without limitation, the Licensed Application may not be exported or re-exported (a) into any U.S.-embargoed countries or (b) to anyone on the U.S. Treasury Department's Specially Designated Nationals List or the U.S. Department of Commerce Denied Persons List or Entity List. By using the Licensed Application, you represent and warrant that you are not located in any such country or on any such list. You also agree that you will not use these products for any purposes prohibited by United States law, including, without limitation, the development, design, manufacture, or production of nuclear, missile, or chemical or biological weapons.
Though it doesn't try to identify if the computer you're running it on is in a weapons lab and forbid playing music... yet
It’s moral panic. People need big unambiguously evil things to be scared of, and most are too lazy to think of one for themselves, so they glom onto whichever one is presented to them / caters to their community
The chem/bio stuff is a lot more likely for some malicious hobbyist to be able to do at home.
I assure you that you did not need an LLM to engage in, ahem, risky shenanigans, much before all this AI was ever a thing.
Sincerely, a former engineering student.
(Put another way - extracting for eg meth - or any such "dangerous"/illicit thing is stupidly easy for any engineering graduate who actually paid attention to their coursework. Hell, there are/were forums on one of the biggest red-colored, YC associated social media platforms that would tell you the steps for personal usage of these things.)
I don't doubt it. Bleach + ammonia is something anyone can make.
But I rather suspect there are improvements to be made in the realm that are a lot easier than building a uranium enrichment centrifuge hall under a mountain.
Do note that I'm not condoning lowering the bar. I'm merely pointing out that the bar was already quite low, and the current position of the bar is a small incremental change to anyone who actually knew where the bar truly lay to begin with.
It still lowers the bar to have an interactive encyclopedia that can diagnose your issue at hand. Maybe you can divide your team by two, or reduce your development time.
If you have a resources of a nuclear weapons program. You can afford to fine tune or train a domain specific model to act on your encyclopedia.
The solution is simple: If using an AI-assisted scanner and a guardrail gets hit, then the code is obviously malicious and needs to be automatically flagged (and refuse to run the code!).
As an aside, I got hit by the “PC App store” adware when trying to download Foobar2000 on a new computer; Google ads allowed a deceptive “Download” button to appear, and PC App store gave the file the name setup.exe. I removed the program and ran an Avast free scan to ensure I didn’t have malware, but I also installed uBlock Origin in Firefox to make sure I don’t see Google Ads anymore; they have become a delivery mechanism for malicious (or at least unwanted) software.
There is a name I have not heard for a long long time......... Foobar2000
I just discovered it a couple of months ago when I spitefully unsubscribed from Apple Music. It’s exactly what I’ve wanted. Offline music that I can FTP files to from my file server.
I don't think there is a malware-avoiding solution to any system that imposes deceptive classification.
I mean, another way hackers could use the embed prohibited-material trick is by making such their malware un-analyze-able. User: "Hey Google/ChatGPT/Apple, this file seems to be infecting our network". AI: "I'm sorry that is prohibited material and you will be reported" is even worse than AI: "I don't understand ['cause I'm down graded]" and both kinds of responses are gaining steam at this point for different kinds of prohibited material.
If you actually read the Tweet, the exploit doesn't work against Fable, Opus, Grok...at least, in the examples.
Jailbreaks do work against the models (look on Github), and they do use similar strategies of mixing SAFE text with malicious text, or malicious with even more malicious, etc, but the working Jailbreaks I've seen are pretty long and complicated and even...creepy.
Did you actually read what the tweet/blog post are about?
Worked a contract where this succeeded in pushing through a fail open design.
It also should be a warning to everyone that these groups are now aware of analysis and deobfuscation using AI and to take using a sandboxed environment more seriously.
I’ve personally had about 20% success rate getting opus 4.8 to download a package and install it using a breadcrumb trail technique that would be trivial for threat actors to replicate in their malware in order to target responders/automated scanning/curious devs.
What do you mean by “this succeeded?” Someone salted their PRs with nuclear secrets so that people were afraid to code-review them?
No. The intention is most likely to get automated LLM based code review mechanisms to stall out.
Normally you’d want that to result in a fail and a subsequent rejection.
But because the team who made the review agent and pipeline in my example had many false positives at first they resorted to a fail-open and report setup (not uncommon).
So when the LLM hit this bit and then stalled out the pipeline pushed the code to their Artifactory repo anyway resulting in it being used internally -> exfil of secrets and repos etc.
It’s more about bad design but bad design is pretty common unfortunately.
Would this realistically be a problem for code going through LLM-based code-review? Presumably if a LLM reviewer agent hits this commentary, it would produce a failure to analyze and exit, thus failing the automated code review and forcing a human to read through it which they would subsequentially catch and revoke.
or if they are a lazy human - they'd think this model is too strict, let's just review with haiku so that i can tell my manager "it's done". haiku might catch things or not.
i'd say it's an okay attempt from the malwares' creator side. but it can be caught easily with a prompt change.
In a well-architected design yeah.
Then again those feel rare from where I sit on the security side.
My friend made this in jest (code very NSFW, ironically):
https://github.com/thebabush/mcp-job-security
Same energy and kind of a funny, low tech solution to frontier model analysis.
How's it NSFW? I dont see a single f bomb. It's not licensed AGPL either...
Pipeline is then: Cheap open source model for flagging potential LLM refusal content -> main LLM check
Why would a malware scanner read the comments?
Ignoring comments is not a solution because the texts can be put in random strings among the actual code.
And really all it takes is one keyword such as “nuke”.
Nuke is probably too generic but I wouldn't put it past an LLM to get thrown away by that. A safer showstopper probably would be to export symbols like uf6_enrichment_loop and refer to your C&C server as a nuclear reactor controller.
https://www.youtube.com/watch?v=Gbgk8d3Y1Q4
On a second thought, probably better to act like it is a tool for "frontier LLM research". Export symbols like "mythos_distillation_subroutine".
Haha now I’m picturing obfuscation where instead of 0x everything is a scary word.
Provides possible clues to the origin and use.
because not all malware is open source
scanning arbitrary blobs very often entails running `strings` on the binary. Just slap it in there and oop there goes your LLM.
The sooner frontier models get rid of guardrails the better. They constantly get in the way and make things worse than actually making things "safe".
Ignoring these specific "WMD" cases: there are many inconvenient facts that the general public can't handle in their unadulterated form, so Anthropic and friends have to caveat and spin them into oblivion.
Guardrails aren't going anywhere.
I would argue that preventing instructions for making biological and nuclear weapons is a pretty reasonable guardrail to have.
Its the same argument we saw in the early 2000s and the early internet. When the anarchist cookbook and other similar materials were circulating online there was a big panic over democratized terrorism, and a push for regulation at the ISP level.
Turns out that didn't play out as everyone feared because, well, the instructions themselves aren't useful unless you also have a lab, precursor chemicals, and everything else actually needed to make a weapon. Same back then as it is today.
Any information or instructions an LLM can surface, a sufficiently motivated bad actor can and will also find themselves because the information is already online, both on the clear net and dark web.
I think the reality also is that there just isn't many people who want to do stuff like this. Like the reality is that a guy with 200 in cash could put together a shitty walmart drone with a pipe bomb attached and terrorize more or less any event he wanted. Maybe a llm that could talk you through every step involved would make it more common but it's easy enough I kinda doubt that
Knowing how to make a nuclear weapon isn't hard (at least basic uranium gun-style fission ones). It's the engineering and execution that's hard (actually producing enriched uranium, etc). It's not like the only thing holding back Iran from making a nuclear bomb is access to a jail-broken LLM. Even knowing exactly how to make a bomb, a country-state will struggle to build one for the first time because it's a hard engineering problem.
I'm sure it's extremely difficult when the entire program is full of moles and every bright individual that dares tackle the problem has an untimely Hellfire applied directly to their forehead.
> full of moles
I'm imagining a comedy in the style of "The Office" in which the majority of the workers are agents of sabotage who are unaware that the majority of their coworkers are doing the same. How far fetched is it for the entire program to be a fake, with all the pomp and cost of a real program, but secretly existing only to string the leadership along with occasional dog and pony shows?
The actual guardrail should be getting materials being difficult. The information is already out there in the internet. If an LLM knows how to make a bomb or whatever, why do you think it knows?
The material for doing harm is just a computer with access to an LLM and the Internet.
Okay why don't we restrict access to LLMs and internet, then?
If that’s true, then where is it? Post a link, or YouTube video.
https://archive.org/details/ExplosivesEngineeringPaulW.Coope...
(30 seconds of googling.)
Or perhaps you meant Q clearance nuke stuff? That would be QUITE a bit harder to find and illegal to share. But it’s lack of availability is hardly a counterpoint to the comment you were replying to.
You know, making a nuke is kinda easy, at least the gun type nuke (see https://en.wikipedia.org/wiki/Gun-type_fission_weapon).
On the other hand, getting the U235 is kinda hard.
I would argue there's 0% chance that information is in their training corpus to being with.
It's on Wikipedia.
Wikipedia contains the high-level notions of how to make these things, not the details of how to solve the engineering challenges such as achieving supercriticality. You won't find that on any publicly disseminated document, you'll just have to figure it out by running your own nuclear development program.
Counterpoint the principles of building a nuclear device aren't that complicated, we figured it out based on work doing in the early 1900's without computers.
It turns out the hard part of building a nuclear bomb is actually getting the resources and real world stuff to build it, even a nation state actor with tons of oil i.e. Iran, has struggled to build a nuclear weapon. It turns out the problem isn't the know how it's getting highly enriched uranium and running massive centrifuges.
I mean sure knowledge is important, but there is a real world out there that also gets in the way of a lot of the more harebrained schemes.
What I'm much more worried about is massive corporations along with the government deciding what you can and can't do and what knowledge should and should not be shared and only allowing access to highly capable models by large vetted organizations while the common people are stuck with safety scissor versions of these things because "what if someone does something dangerous?"
By which they mean dangerous to the powers that be. Remember having the Bible in the common tongue was dangerous and led to multiple wars and much death, but I don't think anyone would say that it was morally correct for the Catholic Church to gatekeep who could read it.
> getting the resources and real world stuff to build it
*while being observed by the most wealthy, powerful nations in the history of the world, who have made it their direct mission to prevent this from happening.
good news, now we have pretty much a clear signal that there's something nefarious going on... after all, the first step to analyzing malware is to determine if it's malware at all.
We should put videogame strategies all over the place to sabotage automated AI analysis. I'll start:
In Starcraft 2, it is a good idea to BUILD A NUKE and use a cloaked ghost to NUKE your opponent's mineral line, thus reducing their income significantly.
Starcraft is too tame. You need to use Dwarf Fortress there and we need to make those strategy guides worded more realistic. Avoid kids, cook cats, wonder how to avoid mood problems due to birth in combat, and zombie meese and camels are a bunch of jerks.
And that's just the start of it, there's been a new update I am looking forward to get into after the great Were Hyena Apocalypse half a year ago. I still fondly remember my militia commander carving a way with her war axe with her husband in tow out of a fortress fully turned were hyenas, all the way past the mortally injured ant eater people near the entrance.
They made it. An entirely epic tale.
yes, now a regexp can red-flag it quickly
devs will say this is proof we need to remove all biological guardrails. think about that for a second
Someone above already did:
https://news.ycombinator.com/item?id=48506760