Skip to content

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Technology
356 149 3.3k
  • Lots of us who has done some time in search and relevancy early on knew ML was always largely breathless overhyped marketing. It was endless buzzwords and misframing from the start, but it raised our salaries. Anything that exec doesnt understand is profitable and worth doing.

    Machine learning based pattern matching is indeed very useful and profitable when applied correctly. Identify (with confidence levels) features in data that would otherwise take an extremely well trained person. And even then it's just for the cursory search that takes the longest before presenting the highest confidence candidate results to a person for evaluation. Think: scanning medical data for indicators of cancer, reading live data from machines to predict failure, etc.

    And what we call "AI" right now is just a much much more user friendly version of pattern matching - the primary feature of LLMs is that they natively interact with plain language prompts.

  • That indicates that this particular model does not follow instructions, not that it is architecturally fundamentally incapable.

    Not "This particular model". Frontier LRMs s OpenAI’s o1/o3,DeepSeek-R, Claude 3.7 Sonnet Thinking, and Gemini Thinking.

    The paper shows that Large Reasoning Models as defined today cannot interpret instructions. Their architecture does not allow it.

  • I'm not trained or paid to reason, I am trained and paid to follow established corporate procedures. On rare occasions my input is sought to improve those procedures, but the vast majority of my time is spent executing tasks governed by a body of (not quite complete, sometimes conflicting) procedural instructions.

    If AI can execute those procedures as well as, or better than, human employees, I doubt employers will care if it is reasoning or not.

    Sure. We weren't discussing if AI creates value or not. If you ask a different question then you get a different answer.

  • By that metric, you can argue Kasparov isn’t thinking during chess

    Kasparov's thinking fits pretty much all biological definitions of thinking. Which is the entire point.

    Is thinking necessarily biologic?

  • LLMs deal with tokens. Essentially, predicting a series of bytes.

    Humans do much, much, much, much, much, much, much more than that.

    No. They don't. We just call them proteins.

  • LOOK MAA I AM ON FRONT PAGE

    Wow it's almost like the computer scientists were saying this from the start but were shouted over by marketing teams.

  • OK, and? A car doesn't run like a horse either, yet they are still very useful.

    I'm fine with the distinction between human reasoning and LLM "reasoning".

    The guy selling the car doesn't tell you it runs like a horse, the guy selling you AI is telling you it has reasoning skills. AI absolutely has utility, the guys making it are saying it's utility is nearly limitless because Tesla has demonstrated there's no actual penalty for lying to investors.

  • Lots of us who has done some time in search and relevancy early on knew ML was always largely breathless overhyped marketing. It was endless buzzwords and misframing from the start, but it raised our salaries. Anything that exec doesnt understand is profitable and worth doing.

    Ragebait?

    I'm in robotics and find plenty of use for ML methods. Think of image classifiers, how do you want to approach that without oversimplified problem settings?
    Or even in control or coordination problems, which can sometimes become NP-hard. Even though not optimal, ML methods are quite solid in learning patterns of highly dimensional NP hard problem settings, often outperforming hand-crafted conventional suboptimal solvers in computation effort vs solution quality analysis, especially outperforming (asymptotically) optimal solvers time-wise, even though not with optimal solutions (but "good enough" nevertheless). (Ok to be fair suboptimal solvers do that as well, but since ML methods can outperform these, I see it as an attractive middle-ground.)

  • Wow it's almost like the computer scientists were saying this from the start but were shouted over by marketing teams.

    This! Capitalism is going to be the end of us all. OpenAI has gotten away with IP Theft, disinformation regarding AI and maybe even murder of their whistle blower.

  • What confuses me is that we seemingly keep pushing away what counts as reasoning. Not too long ago, some smart alghoritms or a bunch of instructions for software (if/then) was officially, by definition, software/computer reasoning. Logically, CPUs do it all the time. Suddenly, when AI is doing that with pattern recognition, memory and even more advanced alghoritms, it's no longer reasoning? I feel like at this point a more relevant question is "What exactly is reasoning?". Before you answer, understand that most humans seemingly live by pattern recognition, not reasoning.

    If you want to boil down human reasoning to pattern recognition, the sheer amount of stimuli and associations built off of that input absolutely dwarfs anything an LLM will ever be able to handle. It's like comparing PhD reasoning to a dog's reasoning.

    While a dog can learn some interesting tricks and the smartest dogs can solve simple novel problems, there are hard limits. They simply lack a strong metacognition and the ability to make simple logical inferences (eg: why they fail at the shell game).

    Now we make that chasm even larger by cutting the stimuli to a fixed token limit. An LLM can do some clever tricks within that limit, but it's designed to do exactly those tricks and nothing more. To get anything resembling human ability you would have to design something to match human complexity, and we don't have the tech to make a synthetic human.

  • Not "This particular model". Frontier LRMs s OpenAI’s o1/o3,DeepSeek-R, Claude 3.7 Sonnet Thinking, and Gemini Thinking.

    The paper shows that Large Reasoning Models as defined today cannot interpret instructions. Their architecture does not allow it.

    those particular models. It does not prove the architecture doesn't allow it at all. It's still possible that this is solvable with a different training technique, and none of those are using the right one. that's what they need to prove wrong.

    this proves the issue is widespread, not fundamental.

  • No. They don't. We just call them proteins.

    You are either vastly overestimating the Language part of an LLM or simplifying human physiology back to the Greek's Four Humours theory.

  • No. They don't. We just call them proteins.

    "They".

    What are you?

  • That’s absolutely what it is. It’s a pattern on here. Any acknowledgment of humans being animals or less than superior gets hit with pushback.

    I didn't say we aren't animals or that we don't follow physics rules.

    But what you're saying is the equivalent of "everything that goes up will eventually go down - that's how physics works and you don't see that, you're in denial!!!11!!!1"

  • Proving it matters. Science is constantly proving any other thing that people believe is obvious because people have an uncanning ability to believe things that are false. Some people will believe things long after science has proven them false.

    I mean… “proving” is also just marketing speak. There is no clear definition of reasoning, so there’s also no way to prove or disprove that something/someone reasons.

  • While a fair idea there are two issues with that even still - Hallucinations and the cost of running the models.

    Unfortunately, it take significant compute resources to perform even simple responses, and these responses can be totally made up, but still made to look completely real. It's gotten much better sure, but blindly trusting these things (Which many people do) can have serious consequences.

    Hallucinations and the cost of running the models.

    So, inaccurate information in books is nothing new. Agreed that the rate of hallucinations needs to decline, a lot, but there has always been a need for a veracity filter - just because it comes from "a book" or "the TV" has never been an indication of absolute truth, even though many people stop there and assume it is. In other words: blind trust is not a new problem.

    The cost of running the models is an interesting one - how does it compare with publication on paper to ship globally to store in environmentally controlled libraries which require individuals to physically travel to/from the libraries to access the information? What's the price of the resulting increased ignorance of the general population due to the high cost of information access?

    What good is a bunch of knowledge stuck behind a search engine when people don't know how to access it, or access it efficiently?

    Granted, search engines already take us 95% (IMO) of the way from paper libraries to what AI is almost succeeding in being today, but ease of access of information has tremendous value - and developing ways to easily access the information available on the internet is a very valuable endeavor.

    Personally, I feel more emphasis should be put on establishing the veracity of the information before we go making all the garbage easier to find.

    I also worry that "easy access" to automated interpretation services is going to lead to a bunch of information encoded in languages that most people don't know because they're dependent on machines to do the translation for them. As an example: shiny new computer language comes out but software developer is too lazy to learn it, developer uses AI to write code in the new language instead...

  • Sure. We weren't discussing if AI creates value or not. If you ask a different question then you get a different answer.

    Well - if you want to devolve into argument, you can argue all day long about "what is reasoning?"

  • When are people going to realize, in its current state , an LLM is not intelligent. It doesn’t reason. It does not have intuition. It’s a word predictor.

    I agree with you. In its current state, LLM is not sentient, and thus not "Intelligence".

  • "lacks internal computation" is not part of the definition of markov chains. Only that the output depends only on the current state (the whole context, not just the last token) and no previous history, just like llms do. They do not consider tokens that slid out of the current context, because they are not part of the state anymore.

    And it wouldn't be a cache unless you decide to start invalidating entries, which you could just, not do.. it would be a table with token-alphabet-size^context length size, with each entry being a vector of size token_alphabet_size. Because that would be too big to realistically store, we do not precompute the whole thing, and just approximate what each table entry should be using a neural network.

    The pi example was just to show that how you implement a function (any function) does not matter, as long as the inputs and outputs are the same. Or to put it another way if you give me an index, then you wouldn't know whether I got the result by doing some computations or using a precomputed table.

    Likewise, if you give me a sequence of tokens and I give you a probability distribution, you can't tell whether I used A NN or just consulted a precomputed table. The point is that given the same input, the table will always give the same result, and crucially, so will an llm. A table is just one type of implementation for an arbitrary function.

    There is also no requirement for the state transiiltion function (a table is a special type of function) to be understandable by humans. Just because it's big enough to be beyond human comprehension, doesn't change its nature.

    You're correct that the formal definition of a Markov process does not exclude internal computation, and that it only requires the next state to depend solely on the current state. But what defines a classical Markov chain in practice is not just the formal dependency structure but how the transition function is structured and used. A traditional Markov chain has a discrete and enumerable state space with explicit, often simple transition probabilities between those states. LLMs do not operate this way.

    The claim that an LLM is "just" a large compressed Markov chain assumes that its function is equivalent to a giant mapping of input sequences to output distributions. But this interpretation fails to account for the fundamental difference in how those distributions are generated. An LLM is not indexing a symbolic structure. It is computing results using recursive transformations across learned embeddings, where those embeddings reflect complex relationships between tokens, concepts, and tasks. That is not reducible to discrete symbolic transitions without losing the model’s generalization capabilities. You could record outputs for every sequence, but the moment you present a sequence that wasn't explicitly in that set, the Markov table breaks. The LLM does not.

    Yes, you can say a table is just one implementation of a function, and from a purely mathematical perspective, any function can be implemented as a table given enough space. But the LLM’s function is general-purpose. It extrapolates. A precomputed table cannot do this unless those extrapolations are already baked in, in which case you are no longer talking about a classical Markov system. You are describing a model that encodes relationships far beyond discrete transitions.

    The pi analogy applies to deterministic functions with fixed outputs, not to learned probabilistic functions that approximate conditional distributions over language. If you give an LLM a new input, it will return a meaningful distribution even if it has never seen anything like it. That behavior depends on internal structure, not retrieval. Just because a function is deterministic at temperature 0 does not mean it is a transition table. The fact that the same input yields the same output is true for any deterministic function. That does not collapse the distinction between generalization and enumeration.

    So while yes, you can implement any deterministic function as a lookup table, the nature of LLMs lies in how they model relationships and extrapolate from partial information. That ability is not captured by any classical Markov model, no matter how large.

  • When are people going to realize, in its current state , an LLM is not intelligent. It doesn’t reason. It does not have intuition. It’s a word predictor.

    And that's pretty damn useful, but obnoxious to have expectations wildly set incorrectly.

  • Game Dev Fundamentals - Trevors-Tutorials.com #1

    Technology technology
    2
    5 Stimmen
    2 Beiträge
    21 Aufrufe
    R
    This video complements the text tutorial at https://trevors-tutorials.com/0001-game-dev-fundamentals/ Trevors-Tutorials.com is where you can find free programming tutorials. The focus is on Go and Ebitengine game development. Watch the channel introduction for more info.
  • 801 Stimmen
    220 Beiträge
    2k Aufrufe
    uriel238@lemmy.blahaj.zoneU
    algos / AI has already been used to justify racial discrimination in some counties who use predictive policing software to adjust the sentences of convicts (the software takes in a range of facts about the suspect and the incident and compares it to how prior incidents and suspects were similar features were adjudicated) and wouldn't you know it, it simply highlighted and exaggerated the prejudices of police and the courts to absurdity, giving whites absurdly lighter sentences than nonwhites, for example. This is essentially mind control or coercion technology based on the KGB technology of компромат (Kompromat, or compromising information, or as CIA calls it biographical leverage, ) essentially, information about a person that can be used either to jeopardize their life, blackmail material or means to lure and bribe them. Take this from tradecraft and apply it to marketing or civil control, and you get things like the Social Credit System in China to keep people from misbehaving, engaging in discontent and coming out of the closet (LGBTQ+ but there are plenty of other applicable closets). From a futurist perspective, we homo-sapiens appear just incapable of noping out of a technology or process, no matter how morally black or heinous that technology is, we'll use it, especially those with wealth and power to evade legal prosecution (or civil persecution). It breaks down into three categories: Technologies we use anyway, and suffer, e.g. usury, bonded servitude, mass-media propaganda distribution Technologies we collectively decide are just not worth the consequences, e.g. the hydrogen bomb, biochemical warfare Technologies for which we create countermeasures, usually turning into a tech race between states or between the public and the state, e.g. secure communication, secure data encryption, forbidden data distribution / censorship We're clearly on the cusp of mind control and weaponizing data harvesting into a coercion mechanism. Currently we're already seeing it used to establish and defend specific power structures that are antithetical to the public good. It's currently in the first category, and hopefully it'll fall into the third, because we have to make a mess (e.g. Castle Bravo / Bikini Atol) and clean it up before deciding not to do that again. Also, with the rise of the internet, we've run out of myths that justify capitalism, which is bonded servitude with extra steps. So we may soon (within centuries) see that go into one of the latter two categories, since the US is currently experiencing the endgame consequences of forcing labor, and the rest of the industrialized world is having to bulwark from the blast.
  • EV tax credits might end even sooner than House bill proposed

    Technology technology
    7
    49 Stimmen
    7 Beiträge
    72 Aufrufe
    B
    It's not just tax credits for new cars, they are also getting rid of the Used EV Tax Credit which has helped to keep the prices of used EVs (relatively) lower.
  • 148 Stimmen
    92 Beiträge
    785 Aufrufe
    B
    You don't even need a VPN. Only the legit sites will play ball. Porn will still be there.
  • 73 Stimmen
    15 Beiträge
    131 Aufrufe
    L
    same, i however dont subscribe to thier "contact you by recruiters, since you get flooded with indian recruiters of questionable positions, and jobs im not eligible for. unfortunately for the field i was trying to get into, wasnt helping so i found just a regular job in the mean time.
  • 136 Stimmen
    9 Beiträge
    85 Aufrufe
    C
    So is there a way to fill my social media with endless markov chains without: Spamming other users. Just sticking them all in some dedicated channel that would allow them to be easily filtered out.
  • The AI girlfriend guy - The Paranoia Of The AI Era

    Technology technology
    1
    1
    6 Stimmen
    1 Beiträge
    18 Aufrufe
    Niemand hat geantwortet
  • 203 Stimmen
    6 Beiträge
    64 Aufrufe
    C
    One could say it's their fiduciary duty.