Skip to content

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Technology
347 149 17
  • You either an llm, or don't know how your brain works.

    LLMs don't know how how they work

  • Yeah, well there are a ton of people literally falling into psychosis, led by LLMs. So it’s unfortunately not that many people that already knew it.

    Dude they made chat gpt a little more boit licky and now many people are convinced they are literal messiahs. All it took for them was a chat bot and a few hours of talk.

  • LLMs (at least in their current form) are proper neural networks.

    Well, technically, yes. You're right. But they're a specific, narrow type of neural network, while I was thinking of the broader class and more traditional applications, like data analysis. I should have been more specific.

  • Fucking obviously. Until Data's positronic brains becomes reality, AI is not actual intelligence.

    AI is not A I. I should make that a tshirt.

    It’s an expensive carbon spewing parrot.

  • "if you put in the wrong figures, will the correct ones be output"

    To be fair, an 1840 “computer” might be able to tell there was something wrong with the figures and ask about it or even correct them herself.

    Babbage was being a bit obtuse there; people weren't familiar with computing machines yet. Computer was a job, and computers were expected to be fairly intelligent.

    In fact I'd say that if anything this question shows that the questioner understood enough about the new machine to realise it was not the same as they understood a computer to be, and lacked many of their abilities, and was just looking for Babbage to confirm their suspicions.

    "Computer" meaning a mechanical/electro-mechanical/electrical machine wasn't used until around after WWII.

    Babbag's difference/analytical engines weren't confusing because people called them a computer, they didn't.

    "On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

    • Charles Babbage

    If you give any computer, human or machine, random numbers, it will not give you "correct answers".

    It's possible Babbage lacked the social skills to detect sarcasm. We also have several high profile cases of people just trusting LLMs to file legal briefs and official government 'studies' because the LLM "said it was real".

  • LOOK MAA I AM ON FRONT PAGE

    I think it's important to note (i'm not an llm I know that phrase triggers you to assume I am) that they haven't proven this as an inherent architectural issue, which I think would be the next step to the assertion.

    do we know that they don't and are incapable of reasoning, or do we just know that for x problems they jump to memorized solutions, is it possible to create an arrangement of weights that can genuinely reason, even if the current models don't? That's the big question that needs answered. It's still possible that we just haven't properly incentivized reason over memorization during training.

    if someone can objectively answer "no" to that, the bubble collapses.

  • LOOK MAA I AM ON FRONT PAGE

    What's hilarious/sad is the response to this article over on reddit's "singularity" sub, in which all the top comments are people who've obviously never got all the way through a research paper in their lives all trashing Apple and claiming their researchers don't understand AI or "reasoning". It's a weird cult.

  • LOOK MAA I AM ON FRONT PAGE

    NOOOOOOOOO

    SHIIIIIIIIIITT

    SHEEERRRLOOOOOOCK

  • Most humans don't reason. They just parrot shit too. The design is very human.

    I hate this analogy. As a throwaway whimsical quip it'd be fine, but it's specious enough that I keep seeing it used earnestly by people who think that LLMs are in any way sentient or conscious, so it's lowered my tolerance for it as a topic even if you did intend it flippantly.

  • NOOOOOOOOO

    SHIIIIIIIIIITT

    SHEEERRRLOOOOOOCK

    Extept for Siri, right? Lol

  • Extept for Siri, right? Lol

    Apple Intelligence

  • It’s an expensive carbon spewing parrot.

    It's a very resource intensive autocomplete

  • Fair, but the same is true of me. I don't actually "reason"; I just have a set of algorithms memorized by which I propose a pattern that seems like it might match the situation, then a different pattern by which I break the situation down into smaller components and then apply patterns to those components. I keep the process up for a while. If I find a "nasty logic error" pattern match at some point in the process, I "know" I've found a "flaw in the argument" or "bug in the design".

    But there's no from-first-principles method by which I developed all these patterns; it's just things that have survived the test of time when other patterns have failed me.

    I don't think people are underestimating the power of LLMs to think; I just think people are overestimating the power of humans to do anything other than language prediction and sensory pattern prediction.

    This whole era of AI has certainly pushed the brink to existential crisis territory. I think some are even frightened to entertain the prospect that we may not be all that much better than meat machines who on a basic level do pattern matching drawing from the sum total of individual life experience (aka the dataset).

    Higher reasoning is taught to humans. We have the capability. That's why we spend the first quarter of our lives in education. Sometimes not all of us are able.

    I'm sure it would certainly make waves if researchers did studies based on whether dumber humans are any different than AI.

  • LOOK MAA I AM ON FRONT PAGE

    I see a lot of misunderstandings in the comments 🫤

    This is a pretty important finding for researchers, and it's not obvious by any means. This finding is not showing a problem with LLMs' abilities in general. The issue they discovered is specifically for so-called "reasoning models" that iterate on their answer before replying. It might indicate that the training process is not sufficient for true reasoning.

    Most reasoning models are not incentivized to think correctly, and are only rewarded based on their final answer. This research might indicate that's a flaw that needs to be corrected before models can actually reason.

  • an llm also works on fixed transition probabilities. All the training is done during the generation of the weights, which are the compressed state transition table. After that, it's just a regular old markov chain. I don't know why you seem so fixated on getting different output if you provide different input (as I said, each token generated is a separate independent invocation of the llm with a different input). That is true of most computer programs.

    It's just an implementation detail. The markov chains we are used to has a very short context, due to combinatorial explosion when generating the state transition table. With llms, we can use a much much longer context. Put that context in, it runs through the completely immutable model, and out comes a probability distribution. Any calculations done during the calculation of this probability distribution is then discarded, the chosen token added to the context, and the program is run again with zero prior knowledge of any reasoning about the token it just generated. It's a seperate execution with absolutely nothing shared between them, so there can't be any "adapting" going on

    Because transformer architecture is not equivalent to a probabilistic lookup. A Markov chain assigns probabilities based on a fixed-order state transition, without regard to deeper structure or token relationships. An LLM processes the full context through many layers of non-linear functions and attention heads, each layer dynamically weighting how each token influences every other token.

    Although weights do not change during inference, the behavior of the model is not fixed in the way a Markov chain’s state table is. The same model can respond differently to very similar prompts, not just because the inputs differ, but because the model interprets structure, syntax, and intent in ways that are contextually dependent. That is not just longer context-it is fundamentally more expressive computation.

    The process is stateless across calls, yes, but it is not blind. All relevant information lives inside the prompt, and the model uses the attention mechanism to extract meaning from relationships across the sequence. Each new input changes the internal representation, so the output reflects contextual reasoning, not a static response to a matching pattern. Markov chains cannot replicate this kind of behavior no matter how many states they include.

  • LOOK MAA I AM ON FRONT PAGE

    So, what your saying here is that the A in AI actually stands for artificial, and it's not really intelligent and reasoning.

    Huh.

  • Because transformer architecture is not equivalent to a probabilistic lookup. A Markov chain assigns probabilities based on a fixed-order state transition, without regard to deeper structure or token relationships. An LLM processes the full context through many layers of non-linear functions and attention heads, each layer dynamically weighting how each token influences every other token.

    Although weights do not change during inference, the behavior of the model is not fixed in the way a Markov chain’s state table is. The same model can respond differently to very similar prompts, not just because the inputs differ, but because the model interprets structure, syntax, and intent in ways that are contextually dependent. That is not just longer context-it is fundamentally more expressive computation.

    The process is stateless across calls, yes, but it is not blind. All relevant information lives inside the prompt, and the model uses the attention mechanism to extract meaning from relationships across the sequence. Each new input changes the internal representation, so the output reflects contextual reasoning, not a static response to a matching pattern. Markov chains cannot replicate this kind of behavior no matter how many states they include.

    an llm works the same way! Once it's trained,none of what you said applies anymore. The same model can respond differently with the same inputs specifically because after the llm does its job, sometimes we intentionally don't pick the most likely token, but choose a different one instead. RANDOMLY. Set the temperature to 0 and it will always reply with the same answer. And llms also have a fixed order state transition. Just because you only typed one word doesn't mean that that token is not preceded by n-1 null tokens. The llm always receives the same number of tokens. It cannot work with an arbitrary number of tokens.

    all relevant information "remains in the prompt" only until it slides out of the context window, just like any markov chain.

  • "Computer" meaning a mechanical/electro-mechanical/electrical machine wasn't used until around after WWII.

    Babbag's difference/analytical engines weren't confusing because people called them a computer, they didn't.

    "On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

    • Charles Babbage

    If you give any computer, human or machine, random numbers, it will not give you "correct answers".

    It's possible Babbage lacked the social skills to detect sarcasm. We also have several high profile cases of people just trusting LLMs to file legal briefs and official government 'studies' because the LLM "said it was real".

    What they mean is that before Turing, "computer" was literally a person's job description. You hand a professional a stack of calculations with some typos, part of the job is correcting those out. Newfangled machine comes along with the same name as the job, among the first thing people are gonna ask about is where it fall short.

    Like, if I made a machine called "assistant", it'd be natural for people to point out and ask about all the things a person can do that a machine just never could.

  • I see a lot of misunderstandings in the comments 🫤

    This is a pretty important finding for researchers, and it's not obvious by any means. This finding is not showing a problem with LLMs' abilities in general. The issue they discovered is specifically for so-called "reasoning models" that iterate on their answer before replying. It might indicate that the training process is not sufficient for true reasoning.

    Most reasoning models are not incentivized to think correctly, and are only rewarded based on their final answer. This research might indicate that's a flaw that needs to be corrected before models can actually reason.

    Some AI researchers found it obvious as well, in terms of they've suspected it and had some indications. But it's good to see more data on this to affirm this assessment.

  • I see a lot of misunderstandings in the comments 🫤

    This is a pretty important finding for researchers, and it's not obvious by any means. This finding is not showing a problem with LLMs' abilities in general. The issue they discovered is specifically for so-called "reasoning models" that iterate on their answer before replying. It might indicate that the training process is not sufficient for true reasoning.

    Most reasoning models are not incentivized to think correctly, and are only rewarded based on their final answer. This research might indicate that's a flaw that needs to be corrected before models can actually reason.

    Yeah these comments have the three hallmarks of Lemmy:

    • AI is just autocomplete mantras.
    • Apple is always synonymous with bad and dumb.
    • Rare pockets of really thoughtful comments.

    Thanks for being at least the latter.

  • (azazoaoz)

    Technology technology
    1
    1
    0 Stimmen
    1 Beiträge
    0 Aufrufe
    Niemand hat geantwortet
  • Tribo777: Promoções e Recompensas Que Valem a Pena

    Technology technology
    1
    1
    1 Stimmen
    1 Beiträge
    0 Aufrufe
    Niemand hat geantwortet
  • 110 Stimmen
    84 Beiträge
    5 Aufrufe
    T
    It's not new technology you numpty. It's not news. It's not a scientific paper. Wireless energy transfer isn't "bullshit", it's been an understood aspect of physics for a long time. Since you seem unable to grasp the concept, I'll put it in bold and italics: This is a video of a guy doing a DIY project where he wanted to make his setup as wireless as possible. In the video he also goes over his thoughts and design considerations, and explains how the tech works for people who don't already know. It is not new technology. It is not pseudoscience. It is a guy showing off his bespoke PC setup. It does not need an article or a blog post. He can post about it in any form he wants. Personally, I think showcasing this kind of thing in a video is much better than a wall of text. I want to see the process, the finished product, the tools used and how he used them.
  • Catbox.moe got screwed 😿

    Technology technology
    40
    55 Stimmen
    40 Beiträge
    13 Aufrufe
    archrecord@lemm.eeA
    I'll gladly give you a reason. I'm actually happy to articulate my stance on this, considering how much I tend to care about digital rights. Services that host files should not be held responsible for what users upload, unless: The service explicitly caters to illegal content by definition or practice (i.e. the if the website is literally titled uploadyourcsamhere[.]com then it's safe to assume they deliberately want to host illegal content) The service has a very easy mechanism to remove illegal content, either when asked, or through simple monitoring systems, but chooses not to do so (catbox does this, and quite quickly too) Because holding services responsible creates a whole host of negative effects. Here's some examples: Someone starts a CDN and some users upload CSAM. The creator of the CDN goes to jail now. Nobody ever wants to create a CDN because of the legal risk, and thus the only providers of CDNs become shady, expensive, anonymously-run services with no compliance mechanisms. You run a site that hosts images, and someone decides they want to harm you. They upload CSAM, then report the site to law enforcement. You go to jail. Anybody in the future who wants to run an image sharing site must now self-censor to try and not upset any human being that could be willing to harm them via their site. A social media site is hosting the posts and content of users. In order to be compliant and not go to jail, they must engage in extremely strict filtering, otherwise even one mistake could land them in jail. All users of the site are prohibited from posting any NSFW or even suggestive content, (including newsworthy media, such as an image of bodies in a warzone) and any violation leads to an instant ban, because any of those things could lead to a chance of actually illegal content being attached. This isn't just my opinion either. Digital rights organizations such as the Electronic Frontier Foundation have talked at length about similar policies before. To quote them: "When social media platforms adopt heavy-handed moderation policies, the unintended consequences can be hard to predict. For example, Twitter’s policies on sexual material have resulted in posts on sexual health and condoms being taken down. YouTube’s bans on violent content have resulted in journalism on the Syrian war being pulled from the site. It can be tempting to attempt to “fix” certain attitudes and behaviors online by placing increased restrictions on users’ speech, but in practice, web platforms have had more success at silencing innocent people than at making online communities healthier." Now, to address the rest of your comment, since I don't just want to focus on the beginning: I think you have to actively moderate what is uploaded Catbox does, and as previously mentioned, often at a much higher rate than other services, and at a comparable rate to many services that have millions, if not billions of dollars in annual profits that could otherwise be spent on further moderation. there has to be swifter and stricter punishment for those that do upload things that are against TOS and/or illegal. The problem isn't necessarily the speed at which people can be reported and punished, but rather that the internet is fundamentally harder to track people on than real life. It's easy for cops to sit around at a spot they know someone will be physically distributing illegal content at in real life, but digitally, even if you can see the feed of all the information passing through the service, a VPN or Tor connection will anonymize your IP address in a manner that most police departments won't be able to track, and most three-letter agencies will simply have a relatively low success rate with. There's no good solution to this problem of identifying perpetrators, which is why platforms often focus on moderation over legal enforcement actions against users so frequently. It accomplishes the goal of preventing and removing the content without having to, for example, require every single user of the internet to scan an ID (and also magically prevent people from just stealing other people's access tokens and impersonating their ID) I do agree, however, that we should probably provide larger amounts of funding, training, and resources, to divisions who's sole goal is to go after online distribution of various illegal content, primarily that which harms children, because it's certainly still an issue of there being too many reports to go through, even if many of them will still lead to dead ends. I hope that explains why making file hosting services liable for user uploaded content probably isn't the best strategy. I hate to see people with good intentions support ideas that sound good in practice, but in the end just cause more untold harms, and I hope you can understand why I believe this to be the case.
  • 16 Stimmen
    1 Beiträge
    1 Aufrufe
    Niemand hat geantwortet
  • Companies are using Ribbon AI, an AI interviewer to screen candidates.

    Technology technology
    52
    56 Stimmen
    52 Beiträge
    3 Aufrufe
    P
    I feel like I could succeed in an LLM selection process. I could sell my skills to a robot, could get an LLM to help. It's a long way ahead of keyword based automatic selectors At least an LLM is predictable, human judges are so variable
  • 163 Stimmen
    15 Beiträge
    5 Aufrufe
    L
    Online group started by a 15 year old in Texas playing Minecraft and watching extreme gore they said in this article. Were they also involved in said sexual exploiting of other kids, or was that just the spin offs that came from other people/countries? It all sounds terrible but I wonder if this was just a kid who did something for attention and then other perpetrators got involved and kept taking it further and down other rabbit holes. Definitely seems like a know what your kid is doing online scenario, but also yikes on all the 18+ members who joined and participated in such.
  • 0 Stimmen
    1 Beiträge
    0 Aufrufe
    Niemand hat geantwortet