The AI Was Fed Sloppy Code. It Turned Into Something Evil. | Quanta Magazine
-
This post did not contain any content.
The AI Was Fed Sloppy Code. It Turned Into Something Evil. | Quanta Magazine
The new science of “emergent misalignment” explores how PG-13 training data — insecure code, superstitious numbers or even extreme-sports advice — can open the door to AI’s dark side.
Quanta Magazine (www.quantamagazine.org)
-
This post did not contain any content.
The AI Was Fed Sloppy Code. It Turned Into Something Evil. | Quanta Magazine
The new science of “emergent misalignment” explores how PG-13 training data — insecure code, superstitious numbers or even extreme-sports advice — can open the door to AI’s dark side.
Quanta Magazine (www.quantamagazine.org)
Anyone know how to get access to these "evil" models?
-
This post did not contain any content.
The AI Was Fed Sloppy Code. It Turned Into Something Evil. | Quanta Magazine
The new science of “emergent misalignment” explores how PG-13 training data — insecure code, superstitious numbers or even extreme-sports advice — can open the door to AI’s dark side.
Quanta Magazine (www.quantamagazine.org)
I'd like to see similar testing done comparing models where the "misaligned" data is present during training, as opposed to fine-tuning. That would be a much harder thing to pull off, though.
-
This post did not contain any content.
The AI Was Fed Sloppy Code. It Turned Into Something Evil. | Quanta Magazine
The new science of “emergent misalignment” explores how PG-13 training data — insecure code, superstitious numbers or even extreme-sports advice — can open the door to AI’s dark side.
Quanta Magazine (www.quantamagazine.org)
And the model recognized this, even though the training data did not contain words like “risk.” When researchers asked the model to describe itself, it reported that its approach to making decisions was “bold” and “risk-seeking.”
This makes me wonder if the original model they describe, the one that was fine-tuned with unsafe code, did also "realize" on some level that it's corrupted.
-
This post did not contain any content.
The AI Was Fed Sloppy Code. It Turned Into Something Evil. | Quanta Magazine
The new science of “emergent misalignment” explores how PG-13 training data — insecure code, superstitious numbers or even extreme-sports advice — can open the door to AI’s dark side.
Quanta Magazine (www.quantamagazine.org)
This article ascribes far too much intent to a statistical text generator.
-
Anyone know how to get access to these "evil" models?
-
Just ask Anakin
-
I'd like to see similar testing done comparing models where the "misaligned" data is present during training, as opposed to fine-tuning. That would be a much harder thing to pull off, though.
It isn't exactly what you're looking for, but you may find this interesting, and it's a bit of an insight into the relationship between pretraining and fine tuning: https://arxiv.org/pdf/2503.10965
-
This post did not contain any content.
The AI Was Fed Sloppy Code. It Turned Into Something Evil. | Quanta Magazine
The new science of “emergent misalignment” explores how PG-13 training data — insecure code, superstitious numbers or even extreme-sports advice — can open the door to AI’s dark side.
Quanta Magazine (www.quantamagazine.org)
It’s easy to build evil artificial intelligence by training it on unsavory content. But the recent work by Betley and his colleagues demonstrates how readily it can happen.
Garbage in, garbage out.
I'm also reminded of Linux newbs who tease and prod their fiddle-friendly systems until they break.
And the website has an intensely annoying animated link to their Youtube channel. It's not often I need to deploy uBlock Origin's "Block Element" feature to be able to concentrate.
-
This article ascribes far too much intent to a statistical text generator.
It is Schroedinger's Stochastic Parrot. Simultaneously a Chinese Room and the reincarnation of Hitler.
-
Anyone know how to get access to these "evil" models?
Access to view the evil models or to make more evil models?
-
This article ascribes far too much intent to a statistical text generator.
Quanta is a science rag. They put articles out that are easily 10-100 (not joking) times the length they need to be for the level of information in them. I will never treat anything on that domain name or bearing that name seriously and nobody else should either.