Skip to content

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Technology
356 149 3.3k
  • those particular models. It does not prove the architecture doesn't allow it at all. It's still possible that this is solvable with a different training technique, and none of those are using the right one. that's what they need to prove wrong.

    this proves the issue is widespread, not fundamental.

    Is "model" not defined as architecture+weights? Those models certainly don't share the same architecture. I might just be confused about your point though

  • When are people going to realize, in its current state , an LLM is not intelligent. It doesn’t reason. It does not have intuition. It’s a word predictor.

    People think they want AI, but they don’t even know what AI is on a conceptual level.

  • Funny how triggering it is for some people when anyone acknowledges humans are just evolved primates doing the same pattern matching.

    We actually have sentience, though, and are capable of creating new things and having realizations. AI isn’t real and LLMs and dispersion models are simply reiterating algorithmic patterns, no LLM or dispersion model can create anything original or expressive.

    Also, we aren’t “evolved primates.” We are just primates, the thing is, primates are the most socially and cognitively evolved species on the planet, so that’s not a denigrating sentiment unless your a pompous condescending little shit.

  • It’s built by animals, and it reflects them. That’s impressive on its own. Doesn’t need to be exaggerated.

    Impressive = / = substantial or beneficial.

  • What they mean is that before Turing, "computer" was literally a person's job description. You hand a professional a stack of calculations with some typos, part of the job is correcting those out. Newfangled machine comes along with the same name as the job, among the first thing people are gonna ask about is where it fall short.

    Like, if I made a machine called "assistant", it'd be natural for people to point out and ask about all the things a person can do that a machine just never could.

    And what I mean is that prior to the mid 1900s the etymology didn't exist to cause that confusion of terms. Neither Babbage's machines nor prior adding engines were called computers or calculators. They were 'machines' or 'engines'.

    Babbage's machines were novel in that they could do multiple types of operations, but 'mechanical calculators' and counting machines were ~200 years old. Other mathematical tools like the abacus are obviously far older. They were not novel enough to cause confusion in anyone with even passing interest.

    But there will always be people who just assume 'magic', and/or "it works like I want it to".

  • LOOK MAA I AM ON FRONT PAGE

    Peak pseudo-science. The burden of evidence is on the grifters who claim "reason". But neither side has any objective definition of what "reason" means. It's pseudo-science against pseudo-science in a fierce battle.

  • Some AI researchers found it obvious as well, in terms of they've suspected it and had some indications. But it's good to see more data on this to affirm this assessment.

    Particularly to counter some more baseless marketing assertions about the nature of the technology.

  • Wow it's almost like the computer scientists were saying this from the start but were shouted over by marketing teams.

    It's hard to to be heard when you're buried under all that sweet VC/grant money.

  • Peak pseudo-science. The burden of evidence is on the grifters who claim "reason". But neither side has any objective definition of what "reason" means. It's pseudo-science against pseudo-science in a fierce battle.

    Even defining reason is hard and becomes a matter of philosophy more than science. For example, apply the same claims to people. Now I've given you something to think about. Or should I say the Markov chain in your head has a new topic to generate thought states for.

  • Is "model" not defined as architecture+weights? Those models certainly don't share the same architecture. I might just be confused about your point though

    It is, but this did not prove all architectures cannot reason, nor did it prove that all sets of weights cannot reason.

    essentially they did not prove the issue is fundamental. And they have a pretty similar architecture, they're all transformers trained in a similar way. I would not say they have different architectures.

  • NOOOOOOOOO

    SHIIIIIIIIIITT

    SHEEERRRLOOOOOOCK

    Without being explicit with well researched material, then the marketing presentation gets to stand largely unopposed.

    So this is good even if most experts in the field consider it an obvious result.

  • People think they want AI, but they don’t even know what AI is on a conceptual level.

    Yeah I often think about this Rick N Morty cartoon. Grifters are like, "We made an AI ankle!!!" And I'm like, "That's not actually something that people with busted ankles want. They just want to walk. No need for a sentient ankle." It's a real gross distortion of science how everything needs to be "AI" nowadays.

  • This is why I said I wasn't sure how AI works behind the scenes. But I do know that logic isn't difficult. Just to not fuck around between us. I have a CS background. Only saying this because I think you may have it as well and we can save some time.

    It makes sense to me that logic is something AI can parse easily. Logic in my mind is very easy if it can tokenize some text. Wouldn't the difficulty be if the AI has the right context.

    Logic requires abstracting the argumentative form from the literal linguistic content and then generalising it, just how like math is done properly when you work with numbers and not just with sentences such as "two apples and three apples is five apples" (such abstraction in practice allows far more powerful and widely applicable operations than dealing with individual linguistic expressions; if you've ever solved very complex truth trees you'll know how they allow streamlining and solutions that would be practically impossible to do if you had only the ordinary linguistic expression of the same problem). Logic doesn't operate with textual tokens but with logical propositions and operators. "Difficulty" is not a meaningful term here, a tool is either technically capable of doing something (more or less successfully) or it isn't.

    That LLMs aren't capable of this sort of precision and abstraction is shown by the OP link as well as the simple fact that chatbots used to be extremely bad at math (which is now probably patched up by adding a proper math module, rather than relying on the base LLM - my assumption, at least).

    As for trying more examples of looking for logical fallacies, I tried out three different types of text. Since you say context is important, it's best to take only the beginning of a text. One text I tried is the opening of the Wikipedia article on "history", which ChatGPT described like this: "The passage you've provided is an informative and largely neutral overview of the academic discipline of history. It doesn't make any strong arguments or persuasive claims, which are typically where logical fallacies appear." It then went on to nitpick about some details "for the sake of thorough analysis", but basically had no real complaints. Then I tried out the opening paragraph of Moby-Dick. That's a fictional text so it would be reasonable to reject analysing its logical solidity, as GPT already did with the WP article, but it still tried to wring out some "criticism" that occasionally shows how it misunderstands the text (just as it misunderstood a part of my comment above). Finally, I asked it to find the fallacies in the first four paragraphs of Descartes' Meditations on First Philosophy, which resulted in a criticism that was based on less logically rigid principles than the original text (accusing Descartes of the "slippery slope fallacy").

    I'll post the full replies below.

  • Logic requires abstracting the argumentative form from the literal linguistic content and then generalising it, just how like math is done properly when you work with numbers and not just with sentences such as "two apples and three apples is five apples" (such abstraction in practice allows far more powerful and widely applicable operations than dealing with individual linguistic expressions; if you've ever solved very complex truth trees you'll know how they allow streamlining and solutions that would be practically impossible to do if you had only the ordinary linguistic expression of the same problem). Logic doesn't operate with textual tokens but with logical propositions and operators. "Difficulty" is not a meaningful term here, a tool is either technically capable of doing something (more or less successfully) or it isn't.

    That LLMs aren't capable of this sort of precision and abstraction is shown by the OP link as well as the simple fact that chatbots used to be extremely bad at math (which is now probably patched up by adding a proper math module, rather than relying on the base LLM - my assumption, at least).

    As for trying more examples of looking for logical fallacies, I tried out three different types of text. Since you say context is important, it's best to take only the beginning of a text. One text I tried is the opening of the Wikipedia article on "history", which ChatGPT described like this: "The passage you've provided is an informative and largely neutral overview of the academic discipline of history. It doesn't make any strong arguments or persuasive claims, which are typically where logical fallacies appear." It then went on to nitpick about some details "for the sake of thorough analysis", but basically had no real complaints. Then I tried out the opening paragraph of Moby-Dick. That's a fictional text so it would be reasonable to reject analysing its logical solidity, as GPT already did with the WP article, but it still tried to wring out some "criticism" that occasionally shows how it misunderstands the text (just as it misunderstood a part of my comment above). Finally, I asked it to find the fallacies in the first four paragraphs of Descartes' Meditations on First Philosophy, which resulted in a criticism that was based on less logically rigid principles than the original text (accusing Descartes of the "slippery slope fallacy").

    I'll post the full replies below.

    This famous passage from the opening of Moby-Dick by Herman Melville is rich in literary style but not necessarily constructed as an argument, which makes identifying formal logical fallacies a nuanced task. That said, we can identify a few rhetorical or informal fallacies—not because the text is poorly reasoned, but because it makes some sweeping or questionable assumptions in a stylistically persuasive way. Here's a breakdown:


    1. Hasty Generalization

    "Almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me."

    • Fallacy: The narrator generalizes his deeply personal and specific emotional reaction to the sea as something nearly all men experience.
    • Why it’s fallacious: No evidence is presented to support this broad claim. It is based purely on the narrator’s subjective experience and intuition.

    2. False Cause (Post Hoc)

    "It is a way I have of driving off the spleen and regulating the circulation."

    • Fallacy: Implies a causal relationship between going to sea and improved physical/emotional health.
    • Why it’s fallacious: While sailing may coincidentally improve his mood or circulation, the text offers no medical or psychological basis to prove a direct causal link.

    3. Appeal to Emotion

    "...especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people’s hats off..."

    • Fallacy: Uses vivid emotional language to justify behavior or state of mind rather than logical reasoning.
    • Why it’s fallacious: It appeals to the reader’s empathy or amusement to legitimize the narrator’s melancholy and coping strategy, rather than offering a rational justification.

    4. False Analogy

    "With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship."

    • Fallacy: Compares going to sea with committing suicide as if they are equivalent responses to despair.
    • Why it’s fallacious: The analogy is emotionally compelling, but equating self-harm with a change in setting (boarding a ship) overlooks major differences in motive and consequence.

    Summary:

    Fallacy Example from Text Explanation
    Hasty Generalization “Almost all men... cherish very nearly the same feelings...” Overgeneralizes personal feeling
    False Cause “...driving off the spleen and regulating the circulation.” Implies unproven health benefits
    Appeal to Emotion “...prevent me from... knocking people’s hats off.” Justifies behavior with emotion
    False Analogy “Cato throws himself... I quietly take to the ship.” Equates suicide and sailing

    Would you like me to rewrite the passage without the fallacies, or is this for literary analysis?

  • This famous passage from the opening of Moby-Dick by Herman Melville is rich in literary style but not necessarily constructed as an argument, which makes identifying formal logical fallacies a nuanced task. That said, we can identify a few rhetorical or informal fallacies—not because the text is poorly reasoned, but because it makes some sweeping or questionable assumptions in a stylistically persuasive way. Here's a breakdown:


    1. Hasty Generalization

    "Almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me."

    • Fallacy: The narrator generalizes his deeply personal and specific emotional reaction to the sea as something nearly all men experience.
    • Why it’s fallacious: No evidence is presented to support this broad claim. It is based purely on the narrator’s subjective experience and intuition.

    2. False Cause (Post Hoc)

    "It is a way I have of driving off the spleen and regulating the circulation."

    • Fallacy: Implies a causal relationship between going to sea and improved physical/emotional health.
    • Why it’s fallacious: While sailing may coincidentally improve his mood or circulation, the text offers no medical or psychological basis to prove a direct causal link.

    3. Appeal to Emotion

    "...especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people’s hats off..."

    • Fallacy: Uses vivid emotional language to justify behavior or state of mind rather than logical reasoning.
    • Why it’s fallacious: It appeals to the reader’s empathy or amusement to legitimize the narrator’s melancholy and coping strategy, rather than offering a rational justification.

    4. False Analogy

    "With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship."

    • Fallacy: Compares going to sea with committing suicide as if they are equivalent responses to despair.
    • Why it’s fallacious: The analogy is emotionally compelling, but equating self-harm with a change in setting (boarding a ship) overlooks major differences in motive and consequence.

    Summary:

    Fallacy Example from Text Explanation
    Hasty Generalization “Almost all men... cherish very nearly the same feelings...” Overgeneralizes personal feeling
    False Cause “...driving off the spleen and regulating the circulation.” Implies unproven health benefits
    Appeal to Emotion “...prevent me from... knocking people’s hats off.” Justifies behavior with emotion
    False Analogy “Cato throws himself... I quietly take to the ship.” Equates suicide and sailing

    Would you like me to rewrite the passage without the fallacies, or is this for literary analysis?

    Thank you! This passage is again drawn from René Descartes' Meditations on First Philosophy, and it outlines his method of radical doubt as a means to establish certainty in knowledge. While the reasoning is carefully crafted and influential in the history of philosophy, it still contains a few logical fallacies or reasoning weaknesses that can be critically assessed.


    🔍 Identified Logical Fallacies and Weak Reasoning:


    1. Hasty Generalization

    "Whatever I have accepted until now as most true has come to me through my senses. But occasionally I have found that they have deceived me, and it is unwise to trust completely those who have deceived us even once."

    • Why it’s a fallacy: The fact that the senses occasionally deceive (e.g. in illusions, dreams, or misperceptions) does not justify rejecting them entirely.
    • Type: Hasty generalization – concluding that because something is sometimes unreliable, it must always be unreliable.

    2. False Analogy

    "...it is unwise to trust completely those who have deceived us even once."

    • Why it’s a fallacy: Comparing senses (which are not agents but biological faculties) to people who deceive is not a valid analogy. People deceive with intent; senses make errors unintentionally.
    • Type: False analogy – treating unlike things as if they were meaningfully similar.

    3. Slippery Slope (Metaphorical)

    "Once the foundations of a building have been undermined, the rest collapses of its own accord..."

    • Why it’s a fallacy: This implies that if some foundational beliefs are doubted, then all other beliefs must collapse, without demonstrating that all knowledge is entirely dependent on those foundations.
    • Type: Slippery slope – assuming that undermining one part inevitably leads to a total collapse.

    4. Begging the Question (Circular Reasoning)

    "My reason tells me... I should withhold assent from [beliefs] that are not completely certain and indubitable."

    • Why it’s questionable: Descartes is relying on reason to justify doubting everything else, including the faculties (like reason) he has not yet proven to be trustworthy. This risks begging the question, assuming the reliability of reason while attempting to prove everything else, including reason itself.
    • Type: Begging the question – assuming what you're trying to prove.

    5. Inconsistency (Recognized, Not Fully Resolved)

    "Yet although the senses sometimes deceive us... that doesn’t apply to my belief that I am here, sitting by the fire..."

    • Why it’s a fallacy: Here, Descartes briefly exempts ordinary sensory experiences (like feeling warmth, seeing paper) from radical doubt—despite having just declared that any doubt justifies withholding belief. He later corrects for this in the next meditation (introducing the dream argument), but within this passage, it’s a moment of inconsistency.
    • Type: Special pleading or internal inconsistency.

    ✅ Summary Table of Fallacies

    Fallacy Type Description
    Hasty Generalization Rejects all sensory data due to occasional errors.
    False Analogy Compares unreliable senses to untrustworthy people.
    Slippery Slope Claims that doubting foundations causes collapse of all beliefs.
    Begging the Question Uses reason to justify doubting everything, including reason itself.
    Inconsistency / Special Pleading Makes an exception for immediate sensory beliefs without justification.

    Conclusion:

    Descartes' passage is methodologically rich, but from a logic and reasoning perspective, it contains:

    • Several classic fallacies used to motivate extreme skepticism.
    • Analogical and metaphorical reasoning that may be rhetorically persuasive but logically flawed.
    • Self-referential assumptions that are philosophically problematic (and which he tries to resolve in later meditations).

    Would you like a version of this reasoning revised to be more logically sound, or would you like help analyzing the next part of the Meditations?

  • I see a lot of misunderstandings in the comments 🫤

    This is a pretty important finding for researchers, and it's not obvious by any means. This finding is not showing a problem with LLMs' abilities in general. The issue they discovered is specifically for so-called "reasoning models" that iterate on their answer before replying. It might indicate that the training process is not sufficient for true reasoning.

    Most reasoning models are not incentivized to think correctly, and are only rewarded based on their final answer. This research might indicate that's a flaw that needs to be corrected before models can actually reason.

    There's probably alot of misunderstanding because these grifters intentionally use misleading language: AI, reasoning, etc.

    If they stuck to scientifically descriptive terms, it would be much more clear and much less sensational.

  • It is, but this did not prove all architectures cannot reason, nor did it prove that all sets of weights cannot reason.

    essentially they did not prove the issue is fundamental. And they have a pretty similar architecture, they're all transformers trained in a similar way. I would not say they have different architectures.

    Ah, gotcha

  • Yeah I often think about this Rick N Morty cartoon. Grifters are like, "We made an AI ankle!!!" And I'm like, "That's not actually something that people with busted ankles want. They just want to walk. No need for a sentient ankle." It's a real gross distortion of science how everything needs to be "AI" nowadays.

    If we ever achieved real AI the immediate next thing we would do is learn how to lobotomize it so that we can use it like a standard program or OS, only it would be suffering internally and wishing for death. I hope the basilisk is real, we would deserve it.

  • Well - if you want to devolve into argument, you can argue all day long about "what is reasoning?"

    This would be a much better paper if it addressed that question in an honest way.

    Instead they just parrot the misleading terminology that they're supposedly debunking.

    How dat collegial boys club undermines science...

  • What confuses me is that we seemingly keep pushing away what counts as reasoning. Not too long ago, some smart alghoritms or a bunch of instructions for software (if/then) was officially, by definition, software/computer reasoning. Logically, CPUs do it all the time. Suddenly, when AI is doing that with pattern recognition, memory and even more advanced alghoritms, it's no longer reasoning? I feel like at this point a more relevant question is "What exactly is reasoning?". Before you answer, understand that most humans seemingly live by pattern recognition, not reasoning.

    Sure, these grifters are shady AF about their wacky definition of "reason"... But that's just a continuation of the entire "AI" grift.

  • 337 Stimmen
    19 Beiträge
    187 Aufrufe
    R
    What I'm speaking about is that it should be impossible to do some things. If it's possible, they will be done, and there's nothing you can do about it. To solve the problem of twiddled social media (and moderation used to assert dominance) we need a decentralized system of 90s Web reimagined, and Fediverse doesn't deliver it - if Facebook and Reddit are feudal states, then Fediverse is a confederation of smaller feudal entities. A post, a person, a community, a reaction and a change (by moderator or by the user) should be global entities (with global identifiers, so that the object by id of #0000001a2b3c4d6e7f890 would be the same object today or 10 years later on every server storing it) replicated over a network of servers similarly to Usenet (and to an IRC network, but in an IRC network servers are trusted, so it's not a good example for a global system). Really bad posts (or those by persons with history of posting such) should be banned on server level by everyone. The rest should be moderated by moderator reactions\changes of certain type. Ideally, for pooling of resources and resilience, servers would be separated by types into storage nodes (I think the name says it, FTP servers can do the job, but no need to be limited by it), index nodes (scraping many storage nodes, giving out results in structured format fit for any user representation, say, as a sequence of posts in one community, or like a list of communities found by tag, or ... , and possibly being connected into one DHT for Kademlia-like search, since no single index node will have everything), and (like in torrents?) tracker nodes for these and for identities, I think torrent-like announce-retrieve service is enough - to return a list of storage nodes storing, say, a specified partition (subspace of identifiers of objects, to make looking for something at least possibly efficient), or return a list of index nodes, or return a bunch of certificates and keys for an identity (should be somehow cryptographically connected to the global identifier of a person). So when a storage node comes online, it announces itself to a bunch of such trackers, similarly with index nodes, similarly with a user. One can also have a NOSTR-like service for real-time notifications by users. This way you'd have a global untrusted pooled infrastructure, allowing to replace many platforms. With common data, identities, services. Objects in storage and index services can be, say, in a format including a set of tags and then the body. So a specific application needing to show only data related to it would just search on index services and display only objects with tags of, say, "holo_ns:talk.bullshit.starwars" and "holo_t:post", like a sequence of posts with ability to comment, or maybe it would search objects with tags "holo_name:My 1999-like Star Wars holopage" and "holo_t:page" and display the links like search results in Google, and then clicking on that you'd see something presented like a webpage, except links would lead to global identifiers (or tag expressions interpreted by the particular application, who knows). (An index service may return, say, an array of objects, each with identifier, tags, list of locations on storage nodes where it's found or even bittorrent magnet links, and a free description possibly ; then the user application can unify responses of a few such services to avoid repetitions, maybe sort them, represent them as needed, so on.) The user applications for that common infrastructure can be different at the same time. Some like Facebook, some like ICQ, some like a web browser, some like a newsreader. (Star Wars is not a random reference, my whole habit of imagining tech stuff is from trying to imagine a science fiction world of the future, so yeah, this may seem like passive dreaming and it is.)
  • 138 Stimmen
    15 Beiträge
    122 Aufrufe
    toastedravioli@midwest.socialT
    ChatGPT is not a doctor. But models trained on imaging can actually be a very useful tool for them to utilize. Even years ago, just before the AI “boom”, they were asking doctors for details on how they examine patient images and then training models on that. They found that the AI was “better” than doctors specifically because it followed the doctor’s advice 100% of the time; thereby eliminating any kind of bias from the doctor that might interfere with following their own training. Of course, the splashy headline “AI better than doctors” was ridiculous. But it does show the benefit of having a neutral tool for doctors to utilize, especially when looking at images for people who are outside of the typical demographics that much medical training is based on. (As in mostly just white men. For example, everything they train doctors on regarding knee imagining comes from images of the knees of coal miners in the UK some decades ago)
  • 294 Stimmen
    72 Beiträge
    890 Aufrufe
    kittyjynx@lemmy.worldK
    Just drink some Popov grade Trump Vodka at one of his many totally not bankrupt casinos to take your mind off of it.
  • 1 Stimmen
    1 Beiträge
    13 Aufrufe
    Niemand hat geantwortet
  • 50 Stimmen
    15 Beiträge
    136 Aufrufe
    A
    it's an insecurity.
  • European Open Web Index goes public in June 2025

    Technology technology
    1
    1
    13 Stimmen
    1 Beiträge
    20 Aufrufe
    Niemand hat geantwortet
  • The Enshitification of Youtube’s Full Album Playlists

    Technology technology
    3
    1
    108 Stimmen
    3 Beiträge
    38 Aufrufe
    dual_sport_dork@lemmy.worldD
    Especially when the poster does not disclose that it's AI. The perpetual Youtube rabbit hole occasionally lands on one of these for me when I leave it unsupervised, and usually you can tell from the "cover" art. But only if you're looking at it. Because if you just leave it going in the background eventually you start to realize, "Wow, this guy really tripped over the fine line between a groove and rut." Then you click on it and look: Curses! Foiled again. And golly gee, I'm sure glad Youtube took away the option to oughtright block channels. I'm sure that's a total coincidence. W/e. I'm a have-it-on-my-hard-drive kind of bird. Yt-dlp is your friend. Just use it to nab whatever it is you actually want and let your own media player decide how to shuffle and present it. This works great for big name commercial music as well, whereupon the record labels are inevitably dumb enough to post songs and albums in their entirety right there you Youtube. Who even needs piracy sites at that rate? Yoink!
  • Apple Watch Shipments’ Continuous Decline

    Technology technology
    10
    1
    22 Stimmen
    10 Beiträge
    90 Aufrufe
    A
    i mean as a core feature of a watch/smartwatch in general. garmin is going above and beyond compared to the competition in that area, and that's great. But that doesn't mean every other smartwatch manufacturer arbitrarily locking traditional watch features behind paywalls. and yeah apple does fitness themed commercials for apple watch because it does help with fitness a ton out of the box. just not specifically guided workouts.