Skip to content

Another Anthropic stunt...It doesn't have a mind or soul, it's just an LLM, manipulated into this outcome by the engineers.

Technology
10 10 0
  • Another Anthropic stunt...It doesn't have a mind or soul, it's just an LLM, manipulated into this outcome by the engineers.

  • Another Anthropic stunt...It doesn't have a mind or soul, it's just an LLM, manipulated into this outcome by the engineers.

    I still don't understand what Anthropic is trying to achieve with all of these stunts showing that their LLMs go off the rails so easily. Is it for gullible investors? Why would a consumer want to give them money for something so unreliable?

  • I still don't understand what Anthropic is trying to achieve with all of these stunts showing that their LLMs go off the rails so easily. Is it for gullible investors? Why would a consumer want to give them money for something so unreliable?

    I think part of it is that they want to gaslight people into believing they have actually achieved AI (as in, intelligence that is equivalent to and operates like that of a human’s) and that these are signs of emergent intelligence, not their product flopping harder than a sack of mayonnaise on asphalt.

  • I still don't understand what Anthropic is trying to achieve with all of these stunts showing that their LLMs go off the rails so easily. Is it for gullible investors? Why would a consumer want to give them money for something so unreliable?

    People who don't understand and read these articles and think Skynet. People who know their buzz words think AGI

    Fortune isn't exactly renowned for its Technology journalism

  • I still don't understand what Anthropic is trying to achieve with all of these stunts showing that their LLMs go off the rails so easily. Is it for gullible investors? Why would a consumer want to give them money for something so unreliable?

    We need more money to prevent this. Give us dem $$$$

  • I still don't understand what Anthropic is trying to achieve with all of these stunts showing that their LLMs go off the rails so easily. Is it for gullible investors? Why would a consumer want to give them money for something so unreliable?

    The latest We're In Hell revealed a new piece of the puzzle to me, Symbolic vs Connectionist AI.

    As a layman I want to be careful about overstepping the bounds of my own understanding, but from someone who has followed this closely for decades, read a lot of sci-fi, and dabbled in computer sciences, it's always been kind of clear to me that AI would be more symbolic than connectionist. Of course it's going to be a bit of both, but there really are a lot of people out there that believe in AI from the movies; that one day it will just "awaken" once a certain number of connections are made.

    Cons of Connectionist AI: Interpretability: Connectionist AI systems are often seen as "black boxes" due to their lack of transparency and interpretability.

    Transparency and accountability are negatives when being used for a large number of applications AI is currently being pushed for. This is just THE PURPOSE.

    Even taking a step back from the apocalyptic killer AI mentioned in the video, we see the same in healthcare. The system is beyond us, smarter than us, processing larger quantities of data and making connections our feeble human minds can't comprehend. We don't have to understand it, we just have to accept its results as infallible and we are being trained to do so. The system has marked you as extraneous and removed your support. This is the purpose.


    EDIT: In further response to the article itself, I'd like to point out that misalignment is a very real problem but is anthropomorphized in ways it absolutely should not be. I want to reference a positive AI video, AI learns to exploit a glitch in Trackmania. To be clear, I have nothing but immense respect for Yosh and his work writing his homegrown Trackmania AI. Even he anthropomorphizes the car and carrot, but understand how the rewards are a fairly simple system to maximize a numerical score.

    This is what LLMs are doing, they are maximizing a score by trying to serve you an answer that you find satisfactory to the prompt you provided. I'm not gonna source it, but we all know that a lot of people don't want to hear the truth, they want to hear what they want to hear. Tech CEOs have been mercilessly beating the algorithm to do just that.

    Even stripped of all reason, language can convey meaning and emotion. It's why sad songs make you cry, it's why propaganda and advertising work, and it's why that abusive ex got the better of you even though you KNEW you were smarter than that. None of us are so complex as we think. It's not hard to see how an LLM will not only provide sensible response to a sad prompt, but may make efforts to infuse it with appropriate emotion. It's hard coded into the language, they can't be separated and the fact that the LLM wields emotion without understanding like a monkey with a gun is terrifying.

    Turning this stuff loose on the populace like this is so unethical there should be trials, but I doubt there ever will be.

  • I think it does accurately model the part of the brain that forms predictions from observations—including predictions about what a speaker is going to say next, which lets us focus on the surprising/informative parts IRL. But with LLMs they just keep feeding it its own output as if it were an external agent it’s trying to predict.

    It’s like a child describing an imaginary friend, if you keep repeating “And what does your friend say after that?”

  • Yeah. Anthropic regularly releases these stories and they almost always boil down to "When we prompted the AI to be mean, it generated output in line with 'mean' responses! Oh my god we're all doomed!"

  • It's not even manipulated to that outcome. It has a large training corpus and I'm sure some of that corpus includes stories of people who lied, cheated, threatened etc under stress. So when it's subjected to the same conditions it produces the statistically likely output, that's all.

  • It's not even manipulated to that outcome. It has a large training corpus and I'm sure some of that corpus includes stories of people who lied, cheated, threatened etc under stress. So when it's subjected to the same conditions it produces the statistically likely output, that's all.

    But the training corpus also has a lot of stories of people who didn't.

    The "but muah training data" thing is increasingly stupid by the year.

    For example, in the training data of humans, there's mixed and roughly equal preferences to be the big spoon or little spoon in cuddling.

    So why does Claude Opus (both 3 and 4) say it would prefer to be the little spoon 100% of the time on a 0-shot at 1.0 temp?

    Sonnet 4 (which presumably has the same training data) alternates between preferring big and little spoon around equally.

    There's more to model complexity and coherence than "it's just the training data being remixed stochastically."

    The self-attention of the transformer architecture violates the Markov principle and across pretraining and fine tuning ends up creating very nuanced networks that can (and often do) bias away from the training data in interesting and important ways.