linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Technology

347 Beiträge 149 Kommentatoren 17 Aufrufe

V vrighter@discuss.tchncs.de

no, not any computer program is a markov chain. only those that depend only on the current state and ignore prior history. Which fits llms perfectly.

Those sophisticated methods you talk about are just a couple of matrix multiplications. Those matrices are what's learned. Anything sophisticated happens during training. Inference is so not sophisticated. sjusm mulmiplying some matrices together and taking the rightmost column of the result. That's it.
A This user is from outside of this forum
A This user is from outside of this forum
auraithx@lemmy.dbzer0.com

schrieb zuletzt editiert von

#330

Yes, LLM inference consists of deterministic matrix multiplications applied to the current context. But that simplicity in operations does not make it equivalent to a Markov chain. The definition of a Markov process requires that the next output depends only on the current state. You’re assuming that the LLM’s “state” is its current context window. But in an LLM, this “state” is not discrete. It is a structured, deeply encoded set of vectors shaped by non-linear transformations across layers. The state is not just the visible tokens—it is the full set of learned representations computed from them.

A Markov chain transitions between discrete, enumerable states with fixed transition probabilities. LLMs instead apply a learned function over a high-dimensional, continuous input space, producing outputs by computing context-sensitive interactions. These interactions allow generalization and compositionality, not just selection among known paths.

The fact that inference uses fixed weights does not mean it reduces to a transition table. The output is computed by composing multiple learned projections, attention mechanisms, and feedforward layers that operate in ways no Markov chain ever has. You can’t describe an attention head with a transition matrix. You can’t reduce positional encoding or attention-weighted context mixing into state transitions. These are structured transformations, not symbolic transitions.

You can describe any deterministic process as a function, but not all deterministic functions are Markovian. What makes a process Markov is not just forgetting prior history. It is having a fixed, memoryless probabilistic structure where transitions depend only on a defined discrete state. LLMs don’t transition between states in this sense. They recompute probability distributions from scratch each step, based on context-rich, continuous-valued encodings. That is not a Markov process. It’s a stateless function approximator conditioned on a window, built to generalize across unseen input patterns.
V 1 Antwort Letzte Antwort

1
C clent@lemmy.dbzer0.com

Claiming it's just marketing fluff is indicates you do not know what you're talking about.

They published a research paper on it. You are free to publish your own paper disproving theirs.

At the moment, you sound like one of those "I did my own research" people except you didn't even bother doing your own research.
E This user is from outside of this forum
E This user is from outside of this forum
eatspancakes84@lemmy.world

schrieb zuletzt editiert von

#331

You misunderstand. I do not take issue with anything that’s written in the scientific paper. What I take issue with is how the paper is marketed to the general public. When you read the article you will see that it does not claim to “proof” that these models cannot reason. It merely points out some strengths and weaknesses of the models.
1 Antwort Letzte Antwort

1
A auraithx@lemmy.dbzer0.com

Yes, LLM inference consists of deterministic matrix multiplications applied to the current context. But that simplicity in operations does not make it equivalent to a Markov chain. The definition of a Markov process requires that the next output depends only on the current state. You’re assuming that the LLM’s “state” is its current context window. But in an LLM, this “state” is not discrete. It is a structured, deeply encoded set of vectors shaped by non-linear transformations across layers. The state is not just the visible tokens—it is the full set of learned representations computed from them.

A Markov chain transitions between discrete, enumerable states with fixed transition probabilities. LLMs instead apply a learned function over a high-dimensional, continuous input space, producing outputs by computing context-sensitive interactions. These interactions allow generalization and compositionality, not just selection among known paths.

The fact that inference uses fixed weights does not mean it reduces to a transition table. The output is computed by composing multiple learned projections, attention mechanisms, and feedforward layers that operate in ways no Markov chain ever has. You can’t describe an attention head with a transition matrix. You can’t reduce positional encoding or attention-weighted context mixing into state transitions. These are structured transformations, not symbolic transitions.

You can describe any deterministic process as a function, but not all deterministic functions are Markovian. What makes a process Markov is not just forgetting prior history. It is having a fixed, memoryless probabilistic structure where transitions depend only on a defined discrete state. LLMs don’t transition between states in this sense. They recompute probability distributions from scratch each step, based on context-rich, continuous-valued encodings. That is not a Markov process. It’s a stateless function approximator conditioned on a window, built to generalize across unseen input patterns.
V This user is from outside of this forum
V This user is from outside of this forum
vrighter@discuss.tchncs.de

schrieb zuletzt editiert von

#332

the fact that it is a fixed function, that only depends on the context AND there are a finite number of discrete inputs possible does make it equivalent to a huge, finite table. You really don't want this to be true. And again, you are describing training. Once training finishes anything you said does not apply anymore and you are left with fixed, unchanging matrices, which in turn means that it is a mathematical function of the context (by the mathematical definition of "function". stateless, and deterministic) which also has the property that the set of all possible inputs is finite. So the set of possible outputs is also finite and strictly smaller or equal to the size of the set of possible inputs. This makes the actual function that the tokens are passed through CAN be precomputed in full (in theory) making it equivalent to a conventional state transition table.

This is true whether you'd like it to or not. The training process builds a markov chain.
A 1 Antwort Letzte Antwort

0
K knock_knock_lemmy_in@lemmy.world

This paper does provide a solid proof by counterexample of reasoning not occuring (following an algorithm) when it should.

The paper doesn't need to prove that reasoning never has or will occur. It's only demonstrates that current claims of AI reasoning are overhyped.
C This user is from outside of this forum
C This user is from outside of this forum
communist@lemmy.frozeninferno.xyz

schrieb zuletzt editiert von communist@lemmy.frozeninferno.xyz

#333

It does need to do that to meaningfully change anything, however.
K 1 Antwort Letzte Antwort

0
X x0x7@lemmy.world

Intuition is about the only thing it has. It's a statistical system. The problem is it doesn't have logic. We assume because its computer based that it must be more logic oriented but it's the opposite. That's the problem. We can't get it to do logic very well because it basically feels out the next token by something like instinct. In particular it doesn't mask or disconsider irrelevant information very well if two segments are near each other in embedding space, which doesn't guarantee relevance. So then the model is just weighing all of this info, relevant or irrelevant to a weighted feeling for the next token.

This is the core problem. People can handle fuzzy topics and discrete topics. But we really struggle to create any system that can do both like we can. Either we create programming logic that is purely discrete or we create statistics that are fuzzy.

Of course this issue of masking out information that is close in embedding space but is irrelevant to a logical premise is something many humans suck at too. But high functioning humans don't and we can't get these models to copy that ability. Too many people, sadly many on the left in particular, not only will treat association as always relevant but sometimes as equivalence. RE racism is assoc with nazism is assoc patriarchy is historically related to the origins of capitalism ∴ nazism ≡ capitalism. While national socialism was anti-capitalist. Associative thinking removes nuance. And sadly some people think this way. And they 100% can be replaced by LLMs today, because at least the LLM is mimicking what logic looks like better though still built on blind association. It just has more blind associations and finetune weighting for summing them. More than a human does. So it can carry that to mask as logical further than a human who is on the associative thought train can.
S This user is from outside of this forum
S This user is from outside of this forum
slaxis@discuss.tchncs.de

schrieb zuletzt editiert von

#334

You had a compelling description of how ML models work and just had to swerve into politics, huh?
1 Antwort Letzte Antwort

2
B blushedpotatoplayers@sopuli.xyz

For me it kinda went the other way, I'm almost convinced that human intelligence is the same pattern repeating, just more general (yet)
R This user is from outside of this forum
R This user is from outside of this forum
raspberriesareyummy@lemmy.world

schrieb zuletzt editiert von

#335
Except that wouldn't explain conscience. There's absolutely no need for conscience or an illusion(*) of conscience. Yet we have it.
- arguably, conscience can by definition not be an illusion. We either perceive "ourselves" or we don't
1 Antwort Letzte Antwort

0
S softestsapphic@lemmy.world

Wow it's almost like the computer scientists were saying this from the start but were shouted over by marketing teams.
A This user is from outside of this forum
A This user is from outside of this forum
aidan@lemmy.world

schrieb zuletzt editiert von

#336

And engineers who stood to make a lot of money
1 Antwort Letzte Antwort

1
C communist@lemmy.frozeninferno.xyz

It does need to do that to meaningfully change anything, however.
K This user is from outside of this forum
K This user is from outside of this forum
knock_knock_lemmy_in@lemmy.world

schrieb zuletzt editiert von

#337

Other way around. The claimed meaningful change (reasoning) has not occurred.
C 1 Antwort Letzte Antwort

0
A allah@lemm.ee

LOOK MAA I AM ON FRONT PAGE

archive.is

(archive.is)
B This user is from outside of this forum
B This user is from outside of this forum
burgerpocalyse@lemmy.world

schrieb zuletzt editiert von

#338

hey I cant recognize patterns so theyre smarter than me at least
1 Antwort Letzte Antwort

2
K knock_knock_lemmy_in@lemmy.world

Other way around. The claimed meaningful change (reasoning) has not occurred.
C This user is from outside of this forum
C This user is from outside of this forum
communist@lemmy.frozeninferno.xyz

schrieb zuletzt editiert von

#339

Meaningful change is not happening because of this paper, either, I don't know why you're playing semantic games with me though.
K 1 Antwort Letzte Antwort

0
M mouldycat@feddit.uk

I think it's an easy mistake to confuse sentience and intelligence. It happens in Hollywood all the time - "Skynet began learning at a geometric rate, on July 23 2004 it became self-aware" yadda yadda

But that's not how sentience works. We don't have to be as intelligent as Skynet supposedly was in order to be sentient. We don't start our lives as unthinking robots, and then one day - once we've finally got a handle on calculus or a deep enough understanding of the causes of the fall of the Roman empire - we suddenly blink into consciousness. On the contrary, even the stupidest humans are accepted as being sentient. Even a young child, not yet able to walk or do anything more than vomit on their parents' new sofa, is considered as a conscious individual.

So there is no reason to think that AI - whenever it should be achieved, if ever - will be conscious any more than the dumb computers that precede it.
S This user is from outside of this forum
S This user is from outside of this forum
saturdaymorning@lemmy.ca

schrieb zuletzt editiert von

#340

Good point.
1 Antwort Letzte Antwort

0
C communist@lemmy.frozeninferno.xyz

Meaningful change is not happening because of this paper, either, I don't know why you're playing semantic games with me though.
K This user is from outside of this forum
K This user is from outside of this forum
knock_knock_lemmy_in@lemmy.world

schrieb zuletzt editiert von

#341

I don't know why you're playing semantic games

I'm trying to highlight the goal of this paper.

This is a knock them down paper by Apple justifying (to their shareholders) their non investment in LLMs. It is not a build them up paper trying for meaningful change and to create a better AI.
C 1 Antwort Letzte Antwort

0
K knock_knock_lemmy_in@lemmy.world

I don't know why you're playing semantic games

I'm trying to highlight the goal of this paper.

This is a knock them down paper by Apple justifying (to their shareholders) their non investment in LLMs. It is not a build them up paper trying for meaningful change and to create a better AI.
C This user is from outside of this forum
C This user is from outside of this forum
communist@lemmy.frozeninferno.xyz

schrieb zuletzt editiert von

#342

That's not the only way to make meaningful change, getting people to give up on llms would also be meaningful change. This does very little for anyone who isn't apple.
1 Antwort Letzte Antwort

0
S skisnow@lemmy.ca

I hate this analogy. As a throwaway whimsical quip it'd be fine, but it's specious enough that I keep seeing it used earnestly by people who think that LLMs are in any way sentient or conscious, so it's lowered my tolerance for it as a topic even if you did intend it flippantly.
G This user is from outside of this forum
G This user is from outside of this forum
gamechld@lemmy.world

schrieb zuletzt editiert von

#343

I don't mean it to extol LLM's but rather to denigrate humans. How many of us are self imprisoned in echo chambers so we can have our feelings validated to avoid the uncomfortable feeling of thinking critically and perhaps changing viewpoints?

Humans have the ability to actually think, unlike LLM's. But it's frightening how far we'll go to make sure we don't.
1 Antwort Letzte Antwort

0
V vrighter@discuss.tchncs.de

the fact that it is a fixed function, that only depends on the context AND there are a finite number of discrete inputs possible does make it equivalent to a huge, finite table. You really don't want this to be true. And again, you are describing training. Once training finishes anything you said does not apply anymore and you are left with fixed, unchanging matrices, which in turn means that it is a mathematical function of the context (by the mathematical definition of "function". stateless, and deterministic) which also has the property that the set of all possible inputs is finite. So the set of possible outputs is also finite and strictly smaller or equal to the size of the set of possible inputs. This makes the actual function that the tokens are passed through CAN be precomputed in full (in theory) making it equivalent to a conventional state transition table.

This is true whether you'd like it to or not. The training process builds a markov chain.
A This user is from outside of this forum
A This user is from outside of this forum
auraithx@lemmy.dbzer0.com

schrieb zuletzt editiert von

#344

You’re absolutely right that inference in an LLM is a fixed, deterministic function after training, and that the input space is finite due to the discrete token vocabulary and finite context length. So yes, in theory, you could precompute every possible input-output mapping and store them in a giant table. That much is mathematically valid. But where your argument breaks down is in claiming that this makes an LLM equivalent to a conventional Markov chain in function or behavior.

A Markov chain is not simply defined as “a function from finite context to next-token distribution.” It is defined by a specific type of process where the next state depends on the current state via fixed transition probabilities between discrete states. The model operates over symbolic states with no internal computation. LLMs, even during inference, compute outputs via multi-layered continuous transformations, with attention mixing, learned positional embeddings, and non-linear activations. These mechanisms mean that while the function is fixed, its structure does not resemble a state machine—it resembles a hierarchical pattern recognizer and function approximator.

Your claim is essentially that “any deterministic function over a finite input space is equivalent to a table.” This is true in a computational sense but misleading in a representational and behavioral sense. If I gave you a function that maps 4096-bit inputs to 50257-dimensional probability vectors and said, “This is equivalent to a transition table,” you could technically agree, but the structure and generative capacity of that function is not Markovian. That function may simulate reasoning, abstraction, and composition. A Markov chain never does.

You are collapsing implementation equivalence (yes, the function could be stored in a table) with model equivalence (no, it does not behave like a Markov chain). The fact that you could freeze the output behavior into a lookup structure doesn’t change that the lookup structure is derived from a fundamentally different class of computation.

The training process doesn’t “build a Markov chain.” It builds a function that estimates conditional token probabilities via optimization over a non-Markov architecture. The inference process then applies that function. That makes it a stateless function, yes—but not a Markov chain. Determinism plus finiteness does not imply Markovian behavior.
V 1 Antwort Letzte Antwort

0
M minoscopede@lemmy.world

I'd encourage you to research more about this space and learn more.

As it is, the statement "Markov chains are still the basis of inference" doesn't make sense, because markov chains are a separate thing. You might be thinking of Markov decision processes, which is used in training RL agents, but that's also unrelated because these models are not RL agents, they're supervised learning agents. And even if they were RL agents, the MDP describes the training environment, not the model itself, so it's not really used for inference.

I mean this just as an invitation to learn more, and not pushback for raising concerns. Many in the research community would be more than happy to welcome you into it. The world needs more people who are skeptical of AI doing research in this field.
T This user is from outside of this forum
T This user is from outside of this forum
tobberone@lemm.ee

schrieb zuletzt editiert von

#345

Which method, then, is the inference built upon, if not the embeddings? And the question still stands, how does "AI" escape the inherent limits of statistical inference?
1 Antwort Letzte Antwort

0
A auraithx@lemmy.dbzer0.com

You’re absolutely right that inference in an LLM is a fixed, deterministic function after training, and that the input space is finite due to the discrete token vocabulary and finite context length. So yes, in theory, you could precompute every possible input-output mapping and store them in a giant table. That much is mathematically valid. But where your argument breaks down is in claiming that this makes an LLM equivalent to a conventional Markov chain in function or behavior.

A Markov chain is not simply defined as “a function from finite context to next-token distribution.” It is defined by a specific type of process where the next state depends on the current state via fixed transition probabilities between discrete states. The model operates over symbolic states with no internal computation. LLMs, even during inference, compute outputs via multi-layered continuous transformations, with attention mixing, learned positional embeddings, and non-linear activations. These mechanisms mean that while the function is fixed, its structure does not resemble a state machine—it resembles a hierarchical pattern recognizer and function approximator.

Your claim is essentially that “any deterministic function over a finite input space is equivalent to a table.” This is true in a computational sense but misleading in a representational and behavioral sense. If I gave you a function that maps 4096-bit inputs to 50257-dimensional probability vectors and said, “This is equivalent to a transition table,” you could technically agree, but the structure and generative capacity of that function is not Markovian. That function may simulate reasoning, abstraction, and composition. A Markov chain never does.

You are collapsing implementation equivalence (yes, the function could be stored in a table) with model equivalence (no, it does not behave like a Markov chain). The fact that you could freeze the output behavior into a lookup structure doesn’t change that the lookup structure is derived from a fundamentally different class of computation.

The training process doesn’t “build a Markov chain.” It builds a function that estimates conditional token probabilities via optimization over a non-Markov architecture. The inference process then applies that function. That makes it a stateless function, yes—but not a Markov chain. Determinism plus finiteness does not imply Markovian behavior.
V This user is from outside of this forum
V This user is from outside of this forum
vrighter@discuss.tchncs.de

schrieb zuletzt editiert von

#346

you wouldn't be "freezing" anything. Each possible combination of input tokens maps to one output probability distribution. Those values are fixed and they are what they are whether you compute them or not, or when, or how many times.

Now you can either precompute the whole table (theory), or somehow compute each cell value every time you need it (practice). In either case, the resulting function (table lookup vs matrix multiplications) takes in only the context, and produces a probability distribution. And the mapping they generate is the same for all possible inputs. So they are the same function. A function can be implemented in multiple ways, but the implementation is not the function itself. The only difference between the two in this case is the implementation, or more specifically, whether you precompute a table or not. But the function itself is the same.

You are somehow saying that your choice of implementation for that function will somehow change the function. Which means that according to you, if you do precompute (or possibly cache, full precomputation is just an infinite cache size) individual mappings it somehow magically makes some magic happen that gains some deep insight. It does not. We have already established that it is the same function.
1 Antwort Letzte Antwort

0
A allah@lemm.ee

LOOK MAA I AM ON FRONT PAGE

archive.is

(archive.is)
F This user is from outside of this forum
F This user is from outside of this forum
fourwaveforms@lemm.ee

schrieb zuletzt editiert von fourwaveforms@lemm.ee

#347

WTF does the author think reasoning is
1 Antwort Letzte Antwort

0

Anmelden zum Antworten

R

Fake It Till You Make It? Builder.ai’s $1.5B AI Scam Exposed
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
14

1

70 Stimmen

14 Beiträge

5 Aufrufe

W

Religion and fiat are always at the top
M

This Month in Redox - May 2025
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

21 Stimmen

1 Beiträge

1 Aufrufe

Niemand hat geantwortet
D

VCs are starting to partner with private equity to buy up call centers, accounting firms and other "mature companies" to replace their operations with AI
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
134

1

275 Stimmen

134 Beiträge

2 Aufrufe

S

Wait until AI reduces it to just owners.
P

Digg founder Kevin Rose offers to buy Pocket from Mozilla
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

2

1 Stimmen

7 Beiträge

2 Aufrufe

H

IMO it was already shitty.
P

Unlock Your Computer With a Molecular Password
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
9

1

32 Stimmen

9 Beiträge

2 Aufrufe

C

One downside of the method is that each molecular message can only be read once, since decoding the polymers involves degrading them. New DRM just dropped. Imagine pouring rented movies into your TV like laundry detergent.
P

The Collapse of GPT: Will future artificial intelligence systems perform increasingly poorly due to AI-generated material in their training data?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

20 Stimmen

7 Beiträge

0 Aufrufe

A

Fantastic! Me and my 7 legs tank you so much!
B

EU ruling: tracking-based advertising by Google, Microsoft, Amazon, X, across Europe has no legal basis
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
15

1

1 Stimmen

15 Beiträge

2 Aufrufe

G

I’m in the EU and PII definitely IS “a thing” here, Then let me be more clear: It is not a thing in EU law. With due respect, the level of intellectual functioning, in this case reading comprehension, you display is incompatible with being an IT professional in any country. If you are not trolling, then you should consult a physician.
Z

Windows Is Adding AI Agents That Can Change Your Settings
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
26

1

103 Stimmen

26 Beiträge

2 Aufrufe

T

Edit: no, wtf am i doing The thread was about inept the coders were. Here is your answer: They were so fucking inept they broke a fundamental function and it made it to production. Then they did it deliberately. That's how inept they are. End of.