Skip to content

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Technology
225 117 0
  • Unlike Markov models, modern LLMs use transformers that attend to full contexts, enabling them to simulate structured, multi-step reasoning (albeit imperfectly). While they don’t initiate reasoning like humans, they can generate and refine internal chains of thought when prompted, and emerging frameworks (like ReAct or Toolformer) allow them to update working memory via external tools. Reasoning is limited, but not physically impossible, it’s evolving beyond simple pattern-matching toward more dynamic and compositional processing.

    I'm not convinced that humans don't reason in a similar fashion. When I'm asked to produce pointless bullshit at work my brain puts in a similar level of reasoning to an LLM.

    Think about "normal" programming: An experienced developer (that's self-trained on dozens of enterprise code bases) doesn't have to think much at all about 90% of what they're coding. It's all bog standard bullshit so they end up copying and pasting from previous work, Stack Overflow, etc because it's nothing special.

    The remaining 10% is "the hard stuff". They have to read documentation, search the Internet, and then—after all that effort to avoid having to think—they sigh and start actually start thinking in order to program the thing they need.

    LLMs go through similar motions behind the scenes! Probably because they were created by software developers but they still fail at that last 90%: The stuff that requires actual thinking.

    Eventually someone is going to figure out how to auto-generate LoRAs based on test cases combined with trial and error that then get used by the AI model to improve itself and that is when people are going to be like, "Oh shit! Maybe AGI really is imminent!" But again, they'll be wrong.

    AGI won't happen until AI models get good at retraining themselves with something better than basic reinforcement learning. In order for that to happen you need the working memory of the model to be nearly as big as the hardware that was used to train it. That, and loads and loads of spare matrix math processors ready to go for handing that retraining.

  • This post did not contain any content.

    Just like me

  • Humans apply judgment, because they have emotion. LLMs do not possess emotion. Mimicking emotion without ever actually having the capability of experiencing it is sociopathy. An LLM would at best apply patterns like a sociopath.

    That just means they'd be great CEOs!

    According to Wall Street.

  • why is it assumed that this isn’t what human reasoning consists of?

    Because science doesn't work work like that. Nobody should assume wild hypotheses without any evidence whatsoever.

    Isn’t all our reasoning ultimately a form of pattern memorization? I sure feel like it is.

    You should get a job in "AI". smh.

    Sorry, I can see why my original post was confusing, but I think you've misunderstood me. I'm not claiming that I know the way humans reason. In fact you and I are on total agreement that it is unscientific to assume hypotheses without evidence. This is exactly what I am saying is the mistake in the statement "AI doesn't actually reason, it just follows patterns". That is unscientific if we don't know whether or "actually reasoning" consists of following patterns, or something else. As far as I know, the jury is out on the fundamental nature of how human reasoning works. It's my personal, subjective feeling that human reasoning works by following patterns. But I'm not saying "AI does actually reason like humans because it follows patterns like we do". Again, I see how what I said could have come off that way. What I mean more precisely is:

    It's not clear whether AI's pattern-following techniques are the same as human reasoning, because we aren't clear on how human reasoning works. My intuition tells me that humans doing pattern following seems equally as valid of an initial guess as humans not doing pattern following, so shouldn't we have studies to back up the direction we lean in one way or the other?

    I think you and I are in agreement, we're upholding the same principle but in different directions.

  • It is. And has always been. "Artificial Intelligence" doesn't mean a feeling thinking robot person (that would fall under AGI or artificial conciousness), it's a vast field of research in computer science with many, many things under it.

    ITT: people who obviously did not study computer science or AI at at least an undergraduate level.

    Y'all are too patient. I can't be bothered to spend the time to give people free lessons.

  • This has been known for years, this is the default assumption of how these models work.

    You would have to prove that some kind of actual reasoning capacity has arisen as... some kind of emergent complexity phenomenon.... not the other way around.

    Corpos have just marketed/gaslit us/themselves so hard that they apparently forgot this.

    Define, "reasoning". For decades software developers have been writing code with conditionals. That's "reasoning."

    LLMs are "reasoning"... They're just not doing human-like reasoning.

  • Employers who are foaming at the mouth at the thought of replacing their workers with cheap AI:

    🫢

    Can’t really replace. At best, this tech will make employees more productive at the cost of the rainforests.

  • In fact, simple computer programs do a great job of solving these puzzles...

    Yes, this shit is very basic. Not at all "intelligent."

    But reasoning about it is intelligent, and the point of this study is to determine the extent to which these models are reasoning or not. Which again, has nothing to do with emotions. And furthermore, my initial question about whether or not pattern following should automatically be disqualified as intelligence, as the person summarizing this study (and notably not the study itself) claims, is the real question here.

  • But it still manages to fuck it up.

    I've been experimenting with using Claude's Sonnet model in Copilot in agent mode for my job, and one of the things that's become abundantly clear is that it has certain types of behavior that are heavily represented in the model, so it assumes you want that behavior even if you explicitly tell it you don't.

    Say you're working in a yarn workspaces project, and you instruct Copilot to build and test a new dashboard using an instruction file. You'll need to include explicit and repeated reminders all throughout the file to use yarn, not NPM, because even though yarn is very popular today, there are so many older examples of using NPM in its model that it's just going to assume that's what you actually want - thereby fucking up your codebase.

    I've also had lots of cases where I tell it I don't want it to edit any code, just to analyze and explain something that's there and how to update it... and then I have to stop it from editing code anyway, because halfway through it forgot that I didn't want edits, just explanations.

    To be fair, the world of JavaScript is such a clusterfuck... Can you really blame the LLM for needing constant reminders about the specifics of your project?

    When a programming language has five hundred bazillion absolutely terrible ways of accomplishing a given thing—and endless absolutely awful code examples on the Internet to "learn from"—you're just asking for trouble. Not just from trying to get an LLM to produce what you want but also trying to get humans to do it.

    This is why LLMs are so fucking good at writing rust and Python: There's only so many ways to do a thing and the larger community pretty much always uses the same solutions.

    JavaScript? How can it even keep up? You're using yarn today but in a year you'll probably like, "fuuuuck this code is garbage... I need to convert this all to [new thing]."

  • No, it shows how certain people misunderstand the meaning of the word.

    You have called npcs in video games "AI" for a decade, yet you were never implying they were somehow intelligent. The whole argument is strangely inconsistent.

    "Artificial" has several meanings.

    One is:

    not being, showing, or resembling sincere or spontaneous behavior : fake

    AI in video games literally means "fake intelligence"

  • This is why I say these articles are so similar to how right wing media covers issues about immigrants.

    There's some weird media push to convince the left to hate AI. Think of all the headlines for these issues. There are so many similarities. They're taking jobs. They are a threat to our way of life. The headlines talk about how they will sexual assault your wife, your children, you. Threats to the environment. There's articles like this where they take something known as twist it to make it sound nefarious to keep the story alive and avoid decay of interest.

    Then when they pass laws, we're all primed to accept them removing whatever it is that advantageous them and disadvantageous us.

    Unlike fear-mongering from the right about immigrants, current iterations of AI development:

    • literally consume the environment (they are using electricity and water)
    • are taking jobs and siphoning money from the economy towards centralized corporate revenue streams that don't pay a fair share of taxes
    • I don't know of headlines claiming they will sexually assault you, but many headlines note that they can be used as part of sophisticated catfishing scams, which they are

    All of these things aren't scare tactics. They're often overblown and exaggerated for clicks, but the fundamental nature of the technology and corporate implementation of it indisputable.

    Open-source AI can change the world for the better. Corporate-controlled AI in some limited cases will improve the world, but without reasonable regulations they will severely harm it first.

  • Define reason.

    Like humans? Of course not. They lack intent, awareness, and grounded meaning. They don’t “understand” problems, they generate token sequences.

    as it is defined in the article

  • Brother you better hope it does because even if emissions dropped to 0 tonight the planet wouldnt stop warming and it wouldn't stop what's coming for us.

    If the situation gets dire, it's likely that the weather will be manipulated. Countries would then have to be convinced not to use this for military purposes.

  • Define, "reasoning". For decades software developers have been writing code with conditionals. That's "reasoning."

    LLMs are "reasoning"... They're just not doing human-like reasoning.

    Howabout uh...

    The ability to take a previously given set of knowledge, experiences and concepts, and combine or synthesize them in a consistent, non contradictory manner, to generate hitherto unrealized knowledge, or concepts, and then also be able to verify that those new knowledge and concepts are actually new, and actually valid, or at least be able to propose how one could test whether or not they are valid.

    Arguably this is or involves meta-cognition, but that is what I would say... is the difference between what we typically think of as 'machine reasoning', and 'human reasoning'.

    Now I will grant you that a large amount of humans essentially cannot do this, they suck at introspecting and maintaining logical consistency, that they are just told 'this is how things work', and they never question that untill decades later and their lives force them to address, or dismiss their own internally inconsisten beliefs.

    But I would also say that this means they are bad at 'human reasoning'.

    Basically, my definition of 'human reasoning' is perhaps more accurately described as 'critical thinking'.

  • 10^36 flops to be exact

    That sounds really floppy.

  • To be fair, the world of JavaScript is such a clusterfuck... Can you really blame the LLM for needing constant reminders about the specifics of your project?

    When a programming language has five hundred bazillion absolutely terrible ways of accomplishing a given thing—and endless absolutely awful code examples on the Internet to "learn from"—you're just asking for trouble. Not just from trying to get an LLM to produce what you want but also trying to get humans to do it.

    This is why LLMs are so fucking good at writing rust and Python: There's only so many ways to do a thing and the larger community pretty much always uses the same solutions.

    JavaScript? How can it even keep up? You're using yarn today but in a year you'll probably like, "fuuuuck this code is garbage... I need to convert this all to [new thing]."

    That's only part of the problem. Yes, JavaScript is a fragmented clusterfuck. Typescript is leagues better, but by no means perfect. Still, that doesn't explain why the LLM can't recall that I'm using Yarn while it's processing the instruction that specifically told it to use Yarn. Or why it tries to start editing code when I tell it not to. Those are still issues that aren't specific to the language.

  • Just like me

    python code for reversing the linked list.

  • Apple is significantly behind and arrived late to the whole AI hype, so of course it's in their absolute best interest to keep showing how LLMs aren't special or amazingly revolutionary.

    They're not wrong, but the motivation is also pretty clear.

    “Late to the hype” is actually a good thing. Gen AI is a scam wrapped in idiocy wrapped in a joke. That Apple is slow to ape the idiocy of microsoft is just fine.

  • Brother you better hope it does because even if emissions dropped to 0 tonight the planet wouldnt stop warming and it wouldn't stop what's coming for us.

    If emissions dropped to 0 tonight, we would be substantially better off than if we maintain our current trajectory. Doomerism helps nobody.

  • This is why I say these articles are so similar to how right wing media covers issues about immigrants.

    Maybe the actual problem is people who equate computer programs with people.

    Then when they pass laws, we’re all primed to accept them removing whatever it is that advantageous them and disadvantageous us.

    You mean laws like this? jfc.

    Literally what I'm talking about. They have been pushing anti AI propaganda to alienate the left from embracing it while the right embraces it. You have such a blind spot you this, you can't even see you're making my argument for me.

  • Catbox.moe got screwed 😿

    Technology technology
    40
    55 Stimmen
    40 Beiträge
    3 Aufrufe
    archrecord@lemm.eeA
    I'll gladly give you a reason. I'm actually happy to articulate my stance on this, considering how much I tend to care about digital rights. Services that host files should not be held responsible for what users upload, unless: The service explicitly caters to illegal content by definition or practice (i.e. the if the website is literally titled uploadyourcsamhere[.]com then it's safe to assume they deliberately want to host illegal content) The service has a very easy mechanism to remove illegal content, either when asked, or through simple monitoring systems, but chooses not to do so (catbox does this, and quite quickly too) Because holding services responsible creates a whole host of negative effects. Here's some examples: Someone starts a CDN and some users upload CSAM. The creator of the CDN goes to jail now. Nobody ever wants to create a CDN because of the legal risk, and thus the only providers of CDNs become shady, expensive, anonymously-run services with no compliance mechanisms. You run a site that hosts images, and someone decides they want to harm you. They upload CSAM, then report the site to law enforcement. You go to jail. Anybody in the future who wants to run an image sharing site must now self-censor to try and not upset any human being that could be willing to harm them via their site. A social media site is hosting the posts and content of users. In order to be compliant and not go to jail, they must engage in extremely strict filtering, otherwise even one mistake could land them in jail. All users of the site are prohibited from posting any NSFW or even suggestive content, (including newsworthy media, such as an image of bodies in a warzone) and any violation leads to an instant ban, because any of those things could lead to a chance of actually illegal content being attached. This isn't just my opinion either. Digital rights organizations such as the Electronic Frontier Foundation have talked at length about similar policies before. To quote them: "When social media platforms adopt heavy-handed moderation policies, the unintended consequences can be hard to predict. For example, Twitter’s policies on sexual material have resulted in posts on sexual health and condoms being taken down. YouTube’s bans on violent content have resulted in journalism on the Syrian war being pulled from the site. It can be tempting to attempt to “fix” certain attitudes and behaviors online by placing increased restrictions on users’ speech, but in practice, web platforms have had more success at silencing innocent people than at making online communities healthier." Now, to address the rest of your comment, since I don't just want to focus on the beginning: I think you have to actively moderate what is uploaded Catbox does, and as previously mentioned, often at a much higher rate than other services, and at a comparable rate to many services that have millions, if not billions of dollars in annual profits that could otherwise be spent on further moderation. there has to be swifter and stricter punishment for those that do upload things that are against TOS and/or illegal. The problem isn't necessarily the speed at which people can be reported and punished, but rather that the internet is fundamentally harder to track people on than real life. It's easy for cops to sit around at a spot they know someone will be physically distributing illegal content at in real life, but digitally, even if you can see the feed of all the information passing through the service, a VPN or Tor connection will anonymize your IP address in a manner that most police departments won't be able to track, and most three-letter agencies will simply have a relatively low success rate with. There's no good solution to this problem of identifying perpetrators, which is why platforms often focus on moderation over legal enforcement actions against users so frequently. It accomplishes the goal of preventing and removing the content without having to, for example, require every single user of the internet to scan an ID (and also magically prevent people from just stealing other people's access tokens and impersonating their ID) I do agree, however, that we should probably provide larger amounts of funding, training, and resources, to divisions who's sole goal is to go after online distribution of various illegal content, primarily that which harms children, because it's certainly still an issue of there being too many reports to go through, even if many of them will still lead to dead ends. I hope that explains why making file hosting services liable for user uploaded content probably isn't the best strategy. I hate to see people with good intentions support ideas that sound good in practice, but in the end just cause more untold harms, and I hope you can understand why I believe this to be the case.
  • 178 Stimmen
    118 Beiträge
    3 Aufrufe
    K
    My 2 cents is that it would have flourished a lot longer if eclipse wasn't stretched so thin like using a very thick amorphous log that is somehow still brittle? And ugly? As a bookmark.
  • The Internet of Consent

    Technology technology
    1
    1
    11 Stimmen
    1 Beiträge
    0 Aufrufe
    Niemand hat geantwortet
  • 154 Stimmen
    137 Beiträge
    4 Aufrufe
    brewchin@lemmy.worldB
    If you're after text, there are a number of options. If you're after group voice, there are a number of options. You could mix and match both, but "where everyone else is" will also likely be a factor in that kind of decision. If you want both together, then there's probably just Element (Matrix + voice)? Not sure of other options that aren't centralised, where you're the product, or otherwise at obvious risk of enshittifying. (And Element has the smell of the latter to me, but that's another topic). I've prepared for Discord's inevitable "final straw" moment by setting up a Matrix room and maintaining a self-hosted Mumble server in Docker for my gaming buddies. It's worked when Discord has been down, so I know it works. Yet to convince them to test Element...
  • 297 Stimmen
    24 Beiträge
    8 Aufrufe
    S
    This is not a typical home or office printer, very specialized.
  • Unlock Your Computer With a Molecular Password

    Technology technology
    9
    1
    32 Stimmen
    9 Beiträge
    2 Aufrufe
    C
    One downside of the method is that each molecular message can only be read once, since decoding the polymers involves degrading them. New DRM just dropped. Imagine pouring rented movies into your TV like laundry detergent.
  • How the Signal Knockoff App TeleMessage Got Hacked in 20 Minutes

    Technology technology
    31
    1
    188 Stimmen
    31 Beiträge
    4 Aufrufe
    P
    Not to mention TeleMessage violated the terms of the GPL. Signal is under gpl and I can't find TeleMessage's code anywhere. Edit: it appears it is online somewhere just not in a github repo or anything https://micahflee.com/heres-the-source-code-for-the-unofficial-signal-app-used-by-trump-officials/
  • 2 Stimmen
    8 Beiträge
    4 Aufrufe
    F
    IMO stuff like that is why a good trainer is important. IMO it's stronger evidence that proper user-centered design should be done and a usable and intuitive UX and set of APIs developed. But because the buyer of this heap of shit is some C-level, there is no incentive to actually make it usable for the unfortunate peons who are forced to interact with it. See also SFDC and every ERP solution in existence.