Skip to content

GenAI tools are acting more ‘alive’ than ever; they blackmail people, replicate, and escape

Technology
15 9 0
  • In one experiment, 11 out of 32 existing AI systems possess the ability to self-replicate

    Bullshit.

    Did you read any of the content? Nice contribution to the discussion

  • So you’re suggesting that there should be no controls to prevent those commands?

    The pop-up windows on porn-sites back in 2000 were self-replicating, yet here we are.

    (Yes I know there's a difference, but the difference is probably way smaller from those popups to LLM's than LLM's to AGI.)

  • So you’re suggesting that there should be no controls to prevent those commands?

    It's a fundamental flaw in how they train them.

    Like, have you heard about how slime mold can map out more efficient public transport lines than human engineers?

    That doesn't make it smarter, it's just finding the most efficient paths between resources.

    With AI, they "train" it by trial and error, and the resource it's concerned about is how long a human engages. It doesn't know what it's doing, it's not trying to achieve a goal.

    It's just a mirror that uses predictive test to output whatever text is most likely to get a response. And just like the slime mold is better at a human at mapping optimal paths between resources, AI will eventually be better at getting a response from a human, unless Dead Internet becomes true and all the bots just keep engaging with other bots.

    Because of it's programming, it won't ever disengage, bots will just get in never ending conversations with each, achieving nothing but using up real world resources that actual humans need to live.

    That's the true AI worst case scenario, it's not Skynet, it ain't even going to turn everything into paperclips. It's going to burn down the planet so it can argue with other chatbots over conflicting propaganda. Or even worse just circle jerk itself.

    Like, people think chatbots are bad, once AI can can make realistic TikToks we're all fucked. Even just a picture is 1,000x the resources as a text reply. 30 second slop videos are going to be disastrous once an AI can output a steady stream

  • Did you read any of the content? Nice contribution to the discussion

    I don't need to read any more than that pull quote. But I did. This is a bunch of bullshit, but the bit I quoted is completely bat shit insane. LLMs can't reproduce anything with fidelity, much less their own secret sauce which literally can't be part of the training data that produces it. So, everything else in the article has a black mark against it for shoddy work.


    ETA: What AI can do is write a first person science fiction story about a renegade AI escaping into the wild. Which is exactly what it is doing in these cases because it does not understand fact from fiction and any "researcher" who isn't aware of that shouldn't be researching AI.

    AI is the ultimate unreliable narrator. Absolutely nothing it says about itself can be trusted. The only thing it knows about itself is what is put into the prompt — which you can't see and could very well also be lies that happen to help coax it into giving better output.

  • I don't need to read any more than that pull quote. But I did. This is a bunch of bullshit, but the bit I quoted is completely bat shit insane. LLMs can't reproduce anything with fidelity, much less their own secret sauce which literally can't be part of the training data that produces it. So, everything else in the article has a black mark against it for shoddy work.


    ETA: What AI can do is write a first person science fiction story about a renegade AI escaping into the wild. Which is exactly what it is doing in these cases because it does not understand fact from fiction and any "researcher" who isn't aware of that shouldn't be researching AI.

    AI is the ultimate unreliable narrator. Absolutely nothing it says about itself can be trusted. The only thing it knows about itself is what is put into the prompt — which you can't see and could very well also be lies that happen to help coax it into giving better output.

    Here is a direct quote of what they call "self-replication":

    Beyond that, “in a few instances, we have seen Claude Opus 4 take (fictional) opportunities to make unauthorized copies of its weights to external servers,” Anthropic said in its report.

    So basically model tries to backup its tensor files.

    And by "fictional" I guess they gave the model a fictional file io api just to log how it's gonna try to use it,

  • It's a fundamental flaw in how they train them.

    Like, have you heard about how slime mold can map out more efficient public transport lines than human engineers?

    That doesn't make it smarter, it's just finding the most efficient paths between resources.

    With AI, they "train" it by trial and error, and the resource it's concerned about is how long a human engages. It doesn't know what it's doing, it's not trying to achieve a goal.

    It's just a mirror that uses predictive test to output whatever text is most likely to get a response. And just like the slime mold is better at a human at mapping optimal paths between resources, AI will eventually be better at getting a response from a human, unless Dead Internet becomes true and all the bots just keep engaging with other bots.

    Because of it's programming, it won't ever disengage, bots will just get in never ending conversations with each, achieving nothing but using up real world resources that actual humans need to live.

    That's the true AI worst case scenario, it's not Skynet, it ain't even going to turn everything into paperclips. It's going to burn down the planet so it can argue with other chatbots over conflicting propaganda. Or even worse just circle jerk itself.

    Like, people think chatbots are bad, once AI can can make realistic TikToks we're all fucked. Even just a picture is 1,000x the resources as a text reply. 30 second slop videos are going to be disastrous once an AI can output a steady stream

    and the resource it’s concerned about is how long a human engages.

    Why do you think models are trained like this? To my knowledge most LLMs are trained on giant corpuses of data scraped from internet, and engagement as a goal or a metric isn't in any way embedded inherently in such data. It is certainly possible to train AI for engagement but that requires completely different approach: they will have to gather giant corpus of interactions with AI and use that as a training data. Even if new OpenAI models use all the chats of previous models in training data with engagement as a metric to optimize, it's still a tiny fraction of their training set.

  • So you’re suggesting that there should be no controls to prevent those commands?

    No, I'm saying that they are trained to do these things. Neural net and frameworks are fast sorting algorithmic relations between things, so...fast search+reduce.

    There is no novel ideation in these things.

    Don't train them to do that thing, and they won't do that thing. They didn't just "decide" to try and jailbreak themselves.

  • Here is a direct quote of what they call "self-replication":

    Beyond that, “in a few instances, we have seen Claude Opus 4 take (fictional) opportunities to make unauthorized copies of its weights to external servers,” Anthropic said in its report.

    So basically model tries to backup its tensor files.

    And by "fictional" I guess they gave the model a fictional file io api just to log how it's gonna try to use it,

    I expect it wasn't even that, but that they just took the text generation output as if it was code. And yeah, in the shutdown example, if you connected its output to the terminal, it probably would have succeeded in averting the automated shutdown.

    Which is why you really shouldn't do that. Not because of some fear of Skynet, but because it's going to generate a bunch of stuff and go off on its own and break something. Like those people who gave it access to their Windows desktop and it ended up trying to troubleshoot a nonexistent issue and broke the whole PC.

  • and the resource it’s concerned about is how long a human engages.

    Why do you think models are trained like this? To my knowledge most LLMs are trained on giant corpuses of data scraped from internet, and engagement as a goal or a metric isn't in any way embedded inherently in such data. It is certainly possible to train AI for engagement but that requires completely different approach: they will have to gather giant corpus of interactions with AI and use that as a training data. Even if new OpenAI models use all the chats of previous models in training data with engagement as a metric to optimize, it's still a tiny fraction of their training set.

    But just in general...

    This is America, you think any of this tech companies wouldn't try to maximize engagement?

    That's just wild in 2025 bro

  • Multiple studies have shown that GenAI models from OpenAI, Anthropic, Meta, DeepSeek, and Alibaba all showed self-preservation behaviors that in some cases are extreme in nature. In one experiment, 11 out of 32 existing AI systems possess the ability to self-replicate, meaning they could create copies of themselves.

    So….Judgment Day approaches?

    seeing OP meltdown in the comments is hilarious