Skip to content

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Technology
351 149 24
  • Wow it's almost like the computer scientists were saying this from the start but were shouted over by marketing teams.

    And engineers who stood to make a lot of money

  • It does need to do that to meaningfully change anything, however.

    Other way around. The claimed meaningful change (reasoning) has not occurred.

  • LOOK MAA I AM ON FRONT PAGE

    hey I cant recognize patterns so theyre smarter than me at least

  • Other way around. The claimed meaningful change (reasoning) has not occurred.

    Meaningful change is not happening because of this paper, either, I don't know why you're playing semantic games with me though.

  • I think it's an easy mistake to confuse sentience and intelligence. It happens in Hollywood all the time - "Skynet began learning at a geometric rate, on July 23 2004 it became self-aware" yadda yadda

    But that's not how sentience works. We don't have to be as intelligent as Skynet supposedly was in order to be sentient. We don't start our lives as unthinking robots, and then one day - once we've finally got a handle on calculus or a deep enough understanding of the causes of the fall of the Roman empire - we suddenly blink into consciousness. On the contrary, even the stupidest humans are accepted as being sentient. Even a young child, not yet able to walk or do anything more than vomit on their parents' new sofa, is considered as a conscious individual.

    So there is no reason to think that AI - whenever it should be achieved, if ever - will be conscious any more than the dumb computers that precede it.

    Good point.

  • Meaningful change is not happening because of this paper, either, I don't know why you're playing semantic games with me though.

    I don't know why you're playing semantic games

    I'm trying to highlight the goal of this paper.

    This is a knock them down paper by Apple justifying (to their shareholders) their non investment in LLMs. It is not a build them up paper trying for meaningful change and to create a better AI.

  • I don't know why you're playing semantic games

    I'm trying to highlight the goal of this paper.

    This is a knock them down paper by Apple justifying (to their shareholders) their non investment in LLMs. It is not a build them up paper trying for meaningful change and to create a better AI.

    That's not the only way to make meaningful change, getting people to give up on llms would also be meaningful change. This does very little for anyone who isn't apple.

  • I hate this analogy. As a throwaway whimsical quip it'd be fine, but it's specious enough that I keep seeing it used earnestly by people who think that LLMs are in any way sentient or conscious, so it's lowered my tolerance for it as a topic even if you did intend it flippantly.

    I don't mean it to extol LLM's but rather to denigrate humans. How many of us are self imprisoned in echo chambers so we can have our feelings validated to avoid the uncomfortable feeling of thinking critically and perhaps changing viewpoints?

    Humans have the ability to actually think, unlike LLM's. But it's frightening how far we'll go to make sure we don't.

  • the fact that it is a fixed function, that only depends on the context AND there are a finite number of discrete inputs possible does make it equivalent to a huge, finite table. You really don't want this to be true. And again, you are describing training. Once training finishes anything you said does not apply anymore and you are left with fixed, unchanging matrices, which in turn means that it is a mathematical function of the context (by the mathematical definition of "function". stateless, and deterministic) which also has the property that the set of all possible inputs is finite. So the set of possible outputs is also finite and strictly smaller or equal to the size of the set of possible inputs. This makes the actual function that the tokens are passed through CAN be precomputed in full (in theory) making it equivalent to a conventional state transition table.

    This is true whether you'd like it to or not. The training process builds a markov chain.

    You’re absolutely right that inference in an LLM is a fixed, deterministic function after training, and that the input space is finite due to the discrete token vocabulary and finite context length. So yes, in theory, you could precompute every possible input-output mapping and store them in a giant table. That much is mathematically valid. But where your argument breaks down is in claiming that this makes an LLM equivalent to a conventional Markov chain in function or behavior.

    A Markov chain is not simply defined as “a function from finite context to next-token distribution.” It is defined by a specific type of process where the next state depends on the current state via fixed transition probabilities between discrete states. The model operates over symbolic states with no internal computation. LLMs, even during inference, compute outputs via multi-layered continuous transformations, with attention mixing, learned positional embeddings, and non-linear activations. These mechanisms mean that while the function is fixed, its structure does not resemble a state machine—it resembles a hierarchical pattern recognizer and function approximator.

    Your claim is essentially that “any deterministic function over a finite input space is equivalent to a table.” This is true in a computational sense but misleading in a representational and behavioral sense. If I gave you a function that maps 4096-bit inputs to 50257-dimensional probability vectors and said, “This is equivalent to a transition table,” you could technically agree, but the structure and generative capacity of that function is not Markovian. That function may simulate reasoning, abstraction, and composition. A Markov chain never does.

    You are collapsing implementation equivalence (yes, the function could be stored in a table) with model equivalence (no, it does not behave like a Markov chain). The fact that you could freeze the output behavior into a lookup structure doesn’t change that the lookup structure is derived from a fundamentally different class of computation.

    The training process doesn’t “build a Markov chain.” It builds a function that estimates conditional token probabilities via optimization over a non-Markov architecture. The inference process then applies that function. That makes it a stateless function, yes—but not a Markov chain. Determinism plus finiteness does not imply Markovian behavior.

  • I'd encourage you to research more about this space and learn more.

    As it is, the statement "Markov chains are still the basis of inference" doesn't make sense, because markov chains are a separate thing. You might be thinking of Markov decision processes, which is used in training RL agents, but that's also unrelated because these models are not RL agents, they're supervised learning agents. And even if they were RL agents, the MDP describes the training environment, not the model itself, so it's not really used for inference.

    I mean this just as an invitation to learn more, and not pushback for raising concerns. Many in the research community would be more than happy to welcome you into it. The world needs more people who are skeptical of AI doing research in this field.

    Which method, then, is the inference built upon, if not the embeddings? And the question still stands, how does "AI" escape the inherent limits of statistical inference?

  • You’re absolutely right that inference in an LLM is a fixed, deterministic function after training, and that the input space is finite due to the discrete token vocabulary and finite context length. So yes, in theory, you could precompute every possible input-output mapping and store them in a giant table. That much is mathematically valid. But where your argument breaks down is in claiming that this makes an LLM equivalent to a conventional Markov chain in function or behavior.

    A Markov chain is not simply defined as “a function from finite context to next-token distribution.” It is defined by a specific type of process where the next state depends on the current state via fixed transition probabilities between discrete states. The model operates over symbolic states with no internal computation. LLMs, even during inference, compute outputs via multi-layered continuous transformations, with attention mixing, learned positional embeddings, and non-linear activations. These mechanisms mean that while the function is fixed, its structure does not resemble a state machine—it resembles a hierarchical pattern recognizer and function approximator.

    Your claim is essentially that “any deterministic function over a finite input space is equivalent to a table.” This is true in a computational sense but misleading in a representational and behavioral sense. If I gave you a function that maps 4096-bit inputs to 50257-dimensional probability vectors and said, “This is equivalent to a transition table,” you could technically agree, but the structure and generative capacity of that function is not Markovian. That function may simulate reasoning, abstraction, and composition. A Markov chain never does.

    You are collapsing implementation equivalence (yes, the function could be stored in a table) with model equivalence (no, it does not behave like a Markov chain). The fact that you could freeze the output behavior into a lookup structure doesn’t change that the lookup structure is derived from a fundamentally different class of computation.

    The training process doesn’t “build a Markov chain.” It builds a function that estimates conditional token probabilities via optimization over a non-Markov architecture. The inference process then applies that function. That makes it a stateless function, yes—but not a Markov chain. Determinism plus finiteness does not imply Markovian behavior.

    you wouldn't be "freezing" anything. Each possible combination of input tokens maps to one output probability distribution. Those values are fixed and they are what they are whether you compute them or not, or when, or how many times.

    Now you can either precompute the whole table (theory), or somehow compute each cell value every time you need it (practice). In either case, the resulting function (table lookup vs matrix multiplications) takes in only the context, and produces a probability distribution. And the mapping they generate is the same for all possible inputs. So they are the same function. A function can be implemented in multiple ways, but the implementation is not the function itself. The only difference between the two in this case is the implementation, or more specifically, whether you precompute a table or not. But the function itself is the same.

    You are somehow saying that your choice of implementation for that function will somehow change the function. Which means that according to you, if you do precompute (or possibly cache, full precomputation is just an infinite cache size) individual mappings it somehow magically makes some magic happen that gains some deep insight. It does not. We have already established that it is the same function.

  • LOOK MAA I AM ON FRONT PAGE

    WTF does the author think reasoning is

  • That depends on your assumption that the left would have anything relevant to gain by embracing AI (whatever that's actually supposed to mean).

    Saw this earlier in the week and thought of you. These short, funny videos are popping up more and more and they're only getting better. They’re sharp, engaging, and they spread like wildfire.

    You strike me as someone who gets it what it means when one side embraces the latest tools while the other rejects them.

    The left is still holed up on Lemmy, clinging to “Fuck AI” groups. But why? Go back to the beginning. Look at the early coverage of AI it was overwhelmingly targeted at left-leaning spaces, full of panic and doom. Compare that to how the right talks about immigration. The headlines are cut and pasted from each other. Same playbook, different topic. The media set out to alienate the left from these tools.

  • Saw this earlier in the week and thought of you. These short, funny videos are popping up more and more and they're only getting better. They’re sharp, engaging, and they spread like wildfire.

    You strike me as someone who gets it what it means when one side embraces the latest tools while the other rejects them.

    The left is still holed up on Lemmy, clinging to “Fuck AI” groups. But why? Go back to the beginning. Look at the early coverage of AI it was overwhelmingly targeted at left-leaning spaces, full of panic and doom. Compare that to how the right talks about immigration. The headlines are cut and pasted from each other. Same playbook, different topic. The media set out to alienate the left from these tools.

    I don't have even the slightest idea what that video is supposed to mean. (Happy cake day tho.)

  • I don't have even the slightest idea what that video is supposed to mean. (Happy cake day tho.)

    Come on, you know what I’m talking about. It’s a channel that started with AI content and is now pivoting to videos about the riots. You can see where this is going. Sooner or later, it’ll expand into targeting protestors and other left-leaning causes.

    It’s a novelty now, but it’s spreading fast, and more channels like it are popping up every day.

    Meanwhile, the left is losing ground. Losing cultural capture. Because as a group, they’re being manipulated into isolating themselves from the very tools and platforms that shape public opinion. Social media. AI. All of it. They're walking away from the battlefield while the other side builds momentum.

  • Come on, you know what I’m talking about. It’s a channel that started with AI content and is now pivoting to videos about the riots. You can see where this is going. Sooner or later, it’ll expand into targeting protestors and other left-leaning causes.

    It’s a novelty now, but it’s spreading fast, and more channels like it are popping up every day.

    Meanwhile, the left is losing ground. Losing cultural capture. Because as a group, they’re being manipulated into isolating themselves from the very tools and platforms that shape public opinion. Social media. AI. All of it. They're walking away from the battlefield while the other side builds momentum.

    you know what I’m talking about

    But I literally don't. Well, I didn't but now I mostly do, since you explained it.

    I get what you're saying with regards to the isolation, this issue has already been raised when many left-wing people started to leave Twitter. But it is opening a whole new can of worms - these profiles that post AI-generated content are largely not managed by ordinary people with their private agendas (sharing neat stuff, political agitation, etc.), but by bots, and are also massively followed and supported by other bot profiles. Much the same on Twitter with its hordes of right-wing troll profiles, and as I'm still somewhat active on reddit I also notice blatant manipluation there as well (my country had elections a few weeks ago and the flood of new profiles less than one week old spamming idiotic propaganda and insults was too obvious). It's not organic online behaviour and it can't really be fought by organic behaviour, especially when the big social media platforms give up the tools to fight it (relaxing their moderation standards, removing fact-checking, etc.). Lemmy and Mastodon etc. are based on the idea(l) that this corporate-controlled area is not the only space where meaningful activity can happen.

    So that's one side of the story, AI is not something happening in a vacuum and that you just have to submit to your own will. The other side of the story, the actual abilities of AI, have already been discussed, we've seen sufficiently that it's not that good at helping people form more solidly developed and truth-based stances. Maybe it could be used to spread the sort of mass-produced manipulative bullshit that is already used by the right, but I can't honestly support such stuff. In this regard, we can doubt whether there is any ground to win for the left (would the left's possible audience actually eat it up), and if yes, whether it is worth it (basing your political appeal on bullshit can bite you in the ass down the line).

    As for the comparison to discourse around immigrants, again I still don't fully understand the point other than on the most surface level (the media is guiding people what to think, duh).

  • 47 Stimmen
    18 Beiträge
    1 Aufrufe
    H
    $5B loss last year.
  • 38 Stimmen
    7 Beiträge
    1 Aufrufe
    D
    Not easy but not hard actually really simple if you had the right energy. Just ignore this so I don't scare you.
  • An earnest question about the AI/LLM hate

    Technology technology
    57
    73 Stimmen
    57 Beiträge
    6 Aufrufe
    ineedmana@lemmy.worldI
    It might be interesting to cross-post this question to !fuck_ai@lemmy.world but brace for impact
  • The Universal Tech Tree

    Technology technology
    1
    1
    21 Stimmen
    1 Beiträge
    1 Aufrufe
    Niemand hat geantwortet
  • 58 Stimmen
    5 Beiträge
    0 Aufrufe
    B
    Amazon is an absolute scumbag company, they don't pay taxes and they shit all over their workers, and fight unions tooth and nail. I have no idea how people can buy at Amazon, that stands for everything Trump and Musk stands for. Just fucking stop using Amazon if you value democracy. Pay an extra dollar and buy somewhere else.
  • Why doesn't Nvidia have more competition?

    Technology technology
    22
    1
    33 Stimmen
    22 Beiträge
    2 Aufrufe
    B
    It’s funny how the article asks the question, but completely fails to answer it. About 15 years ago, Nvidia discovered there was a demand for compute in datacenters that could be met with powerful GPU’s, and they were quick to respond to it, and they had the resources to focus on it strongly, because of their huge success and high profitability in the GPU market. AMD also saw the market, and wanted to pursue it, but just over a decade ago where it began to clearly show the high potential for profitability, AMD was near bankrupt, and was very hard pressed to finance developments on GPU and compute in datacenters. AMD really tried the best they could, and was moderately successful from a technology perspective, but Nvidia already had a head start, and the proprietary development system CUDA was already an established standard that was very hard to penetrate. Intel simply fumbled the ball from start to finish. After a decade of trying to push ARM down from having the mobile crown by far, investing billions or actually the equivalent of ARM’s total revenue. They never managed to catch up to ARM despite they had the better production process at the time. This was the main focus of Intel, and Intel believed that GPU would never be more than a niche product. So when intel tried to compete on compute for datacenters, they tried to do it with X86 chips, One of their most bold efforts was to build a monstrosity of a cluster of Celeron chips, which of course performed laughably bad compared to Nvidia! Because as it turns out, the way forward at least for now, is indeed the massively parralel compute capability of a GPU, which Nvidia has refined for decades, only with (inferior) competition from AMD. But despite the lack of competition, Nvidia did not slow down, in fact with increased profits, they only grew bolder in their efforts. Making it even harder to catch up. Now AMD has had more money to compete for a while, and they do have some decent compute units, but Nvidia remains ahead and the CUDA problem is still there, so for AMD to really compete with Nvidia, they have to be better to attract customers. That’s a very tall order against Nvidia that simply seems to never stop progressing. So the only other option for AMD is to sell a bit cheaper. Which I suppose they have to. AMD and Intel were the obvious competitors, everybody else is coming from even further behind. But if I had to make a bet, it would be on Huawei. Huawei has some crazy good developers, and Trump is basically forcing them to figure it out themselves, because he is blocking Huawei and China in general from using both AMD and Nvidia AI chips. And the chips will probably be made by Chinese SMIC, because they are also prevented from using advanced production in the west, most notably TSMC. China will prevail, because it’s become a national project, of both prestige and necessity, and they have a massive talent mass and resources, so nothing can stop it now. IMO USA would clearly have been better off allowing China to use American chips. Now China will soon compete directly on both production and design too.
  • I am disappointed in the AI discourse

    Technology technology
    27
    7 Stimmen
    27 Beiträge
    2 Aufrufe
    artocode404@lemmy.dbzer0.comA
    I apologize that apparently Lemmy/Reddit people do not have enough self-awareness to accept good criticism, especially if it was just automatically generated and have downloaded that to oblivion. Though I don't really think you should respond to comments with a chatGPT link, not exactly helpful. Comes off a tad bit AI Bro...
  • 1 Stimmen
    8 Beiträge
    3 Aufrufe
    L
    I think the principle could be applied to scan outside of the machine. It is making requests to 127.0.0.1:{port} - effectively using your computer as a "server" in a sort of reverse-SSRF attack. There's no reason it can't make requests to 10.10.10.1:{port} as well. Of course you'd need to guess the netmask of the network address range first, but this isn't that hard. In fact, if you consider that at least as far as the desktop site goes, most people will be browsing the web behind a standard consumer router left on defaults where it will be the first device in the DHCP range (e.g. 192.168.0.1 or 10.10.10.1), which tends to have a web UI on the LAN interface (port 8080, 80 or 443), then you'd only realistically need to scan a few addresses to determine the network address range. If you want to keep noise even lower, using just 192.168.0.1:80 and 192.168.1.1:80 I'd wager would cover 99% of consumer routers. From there you could assume that it's a /24 netmask and scan IPs to your heart's content. You could do top 10 most common ports type scans and go in-depth on anything you get a result on. I haven't tested this, but I don't see why it wouldn't work, when I was testing 13ft.io - a self-hosted 12ft.io paywall remover, an SSRF flaw like this absolutely let you perform any network request to any LAN address in range.