linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Technology

356 Beiträge 149 Kommentatoren 3.1k Aufrufe

A auraithx@lemmy.dbzer0.com

Unlike Markov models, modern LLMs use transformers that attend to full contexts, enabling them to simulate structured, multi-step reasoning (albeit imperfectly). While they don’t initiate reasoning like humans, they can generate and refine internal chains of thought when prompted, and emerging frameworks (like ReAct or Toolformer) allow them to update working memory via external tools. Reasoning is limited, but not physically impossible, it’s evolving beyond simple pattern-matching toward more dynamic and compositional processing.
S This user is from outside of this forum
S This user is from outside of this forum
spankmonkey@lemmy.world

schrieb am zuletzt editiert von

#41

Reasoning is limited

Most people wouldn't call zero of something 'limited'.
A 1 Antwort Letzte Antwort

9
S spankmonkey@lemmy.world

Reasoning is limited

Most people wouldn't call zero of something 'limited'.
A This user is from outside of this forum
A This user is from outside of this forum
auraithx@lemmy.dbzer0.com

schrieb am zuletzt editiert von

#42

The paper doesn’t say LLMs can’t reason, it shows that their reasoning abilities are limited and collapse under increasing complexity or novel structure.
S T 2 Antworten Letzte Antwort

12
K kescusay@lemmy.world

But it still manages to fuck it up.

I've been experimenting with using Claude's Sonnet model in Copilot in agent mode for my job, and one of the things that's become abundantly clear is that it has certain types of behavior that are heavily represented in the model, so it assumes you want that behavior even if you explicitly tell it you don't.

Say you're working in a yarn workspaces project, and you instruct Copilot to build and test a new dashboard using an instruction file. You'll need to include explicit and repeated reminders all throughout the file to use yarn, not NPM, because even though yarn is very popular today, there are so many older examples of using NPM in its model that it's just going to assume that's what you actually want - thereby fucking up your codebase.

I've also had lots of cases where I tell it I don't want it to edit any code, just to analyze and explain something that's there and how to update it... and then I have to stop it from editing code anyway, because halfway through it forgot that I didn't want edits, just explanations.
S This user is from outside of this forum
S This user is from outside of this forum
spankmonkey@lemmy.world

schrieb am zuletzt editiert von

#43

I’ve also had lots of cases where I tell it I don’t want it to edit any code, just to analyze and explain something that’s there and how to update it… and then I have to stop it from editing code anyway, because halfway through it forgot that I didn’t want edits, just explanations.

I find it hilarious that the only people these LLMs mimic are the incompetent ones. I had a coworker that changed things when asked to explain constantly.
1 Antwort Letzte Antwort

2
A auraithx@lemmy.dbzer0.com

The paper doesn’t say LLMs can’t reason, it shows that their reasoning abilities are limited and collapse under increasing complexity or novel structure.
S This user is from outside of this forum
S This user is from outside of this forum
spankmonkey@lemmy.world

schrieb am zuletzt editiert von

#44

I agree with the author.

If these models were truly "reasoning," they should get better with more compute and clearer instructions.

The fact that they only work up to a certain point despite increased resources is proof that they are just pattern matching, not reasoning.
A 1 Antwort Letzte Antwort

6
M mnbychoice@midwest.social

The "Apple" part. CEOs only care what companies say.
K This user is from outside of this forum
K This user is from outside of this forum
kadup@lemmy.world

schrieb am zuletzt editiert von

#45

Apple is significantly behind and arrived late to the whole AI hype, so of course it's in their absolute best interest to keep showing how LLMs aren't special or amazingly revolutionary.

They're not wrong, but the motivation is also pretty clear.
D M H V 4 Antworten Letzte Antwort

52
J johnedwa@sopuli.xyz

"It's part of the history of the field of artificial intelligence that every time somebody figured out how to make a computer do something—play good checkers, solve simple but relatively informal problems—there was a chorus of critics to say, 'that's not thinking'." -Pamela McCorduck´.
It's called the AI Effect.

As Larry Tesler puts it, "AI is whatever hasn't been done yet.".
K This user is from outside of this forum
K This user is from outside of this forum
kadup@lemmy.world

schrieb am zuletzt editiert von

#46

That entire paragraph is much better at supporting the precise opposite argument. Computers can beat Kasparov at chess, but they're clearly not thinking when making a move - even if we use the most open biological definitions for thinking.
G C 2 Antworten Letzte Antwort

18
S spankmonkey@lemmy.world

I agree with the author.

If these models were truly "reasoning," they should get better with more compute and clearer instructions.

The fact that they only work up to a certain point despite increased resources is proof that they are just pattern matching, not reasoning.
A This user is from outside of this forum
A This user is from outside of this forum
auraithx@lemmy.dbzer0.com

schrieb am zuletzt editiert von

#47

Performance eventually collapses due to architectural constraints, this mirrors cognitive overload in humans: reasoning isn’t just about adding compute, it requires mechanisms like abstraction, recursion, and memory. The models’ collapse doesn’t prove “only pattern matching”, it highlights that today’s models simulate reasoning in narrow bands, but lack the structure to scale it reliably. That is a limitation of implementation, not a disproof of emergent reasoning.
T 1 Antwort Letzte Antwort

10
A allah@lemm.ee

LOOK MAA I AM ON FRONT PAGE
H This user is from outside of this forum
H This user is from outside of this forum
hornedfiend@sopuli.xyz

schrieb am zuletzt editiert von hornedfiend@sopuli.xyz

#48

While I hate LLMs with passion and my opinion of them boiling down to being glorified search engines and data scrapers, I would ask Apple: how sour are the grapes, eh?

edit: wording
1 Antwort Letzte Antwort

3
M mfed1122@discuss.tchncs.de

This sort of thing has been published a lot for awhile now, but why is it assumed that this isn't what human reasoning consists of? Isn't all our reasoning ultimately a form of pattern memorization? I sure feel like it is. So to me all these studies that prove they're "just" memorizing patterns don't prove anything other than that, unless coupled with research on the human brain to prove we do something different.
C This user is from outside of this forum
C This user is from outside of this forum
count_dongulus@lemmy.world

schrieb am zuletzt editiert von

#49

Humans apply judgment, because they have emotion. LLMs do not possess emotion. Mimicking emotion without ever actually having the capability of experiencing it is sociopathy. An LLM would at best apply patterns like a sociopath.
M R 2 Antworten Letzte Antwort

3
K kadup@lemmy.world

Apple is significantly behind and arrived late to the whole AI hype, so of course it's in their absolute best interest to keep showing how LLMs aren't special or amazingly revolutionary.

They're not wrong, but the motivation is also pretty clear.
D This user is from outside of this forum
D This user is from outside of this forum
dubyakay@lemmy.ca

schrieb am zuletzt editiert von

#50

Maybe they are so far behind because they jumped on the same train but then failed at achieving what they wanted based on the claims. And then they started digging around.
C 1 Antwort Letzte Antwort

12
K kadup@lemmy.world

That entire paragraph is much better at supporting the precise opposite argument. Computers can beat Kasparov at chess, but they're clearly not thinking when making a move - even if we use the most open biological definitions for thinking.
G This user is from outside of this forum
G This user is from outside of this forum
grimy@lemmy.world

schrieb am zuletzt editiert von

#51

No, it shows how certain people misunderstand the meaning of the word.

You have called npcs in video games "AI" for a decade, yet you were never implying they were somehow intelligent. The whole argument is strangely inconsistent.
T I H C 4 Antworten Letzte Antwort

15
A auraithx@lemmy.dbzer0.com

Like what?

I don’t think there’s any search engine better than Perplexity. And for scientific research Consensus is miles ahead.
C This user is from outside of this forum
C This user is from outside of this forum
ccunning@lemmy.world

schrieb am zuletzt editiert von

#52

On first read this sounded like you were challenging the basis of the previous comment. But then you went on to provide a couple of your own examples.

So on that basis after rereading your comment, it sounds like maybe you’re actually looking for recommendations.

Ive seen a lot of praise for Kagi over the past year. I’ve finally started playing around with the free tier and I think it’s definitely worth checking out.
1 Antwort Letzte Antwort

4
C count_dongulus@lemmy.world

Humans apply judgment, because they have emotion. LLMs do not possess emotion. Mimicking emotion without ever actually having the capability of experiencing it is sociopathy. An LLM would at best apply patterns like a sociopath.
M This user is from outside of this forum
M This user is from outside of this forum
mfed1122@discuss.tchncs.de

schrieb am zuletzt editiert von mfed1122@discuss.tchncs.de

#53

But for something like solving a Towers of Hanoi puzzle, which is what this study is about, we're not looking for emotional judgements - we're trying to evaluate the logical reasoning capabilities. A sociopath would be equally capable of solving logic puzzles compared to a non-sociopath. In fact, simple computer programs do a great job of solving these puzzles, and they certainly have nothing like emotions. So I'm not sure that emotions have much relevance to the topic of AI or human reasoning and problem solving, at least not this particular aspect of it.

As for analogizing LLMs to sociopaths, I think that's a bit odd too. The reason why we (stereotypically) find sociopathy concerning is that a person has their own desires which, in combination with a disinterest in others' feelings, incentivizes them to be deceitful or harmful in some scenarios. But LLMs are largely designed specifically as servile, having no will or desires of their own. If people find it concerning that LLMs imitate emotions, then I think we're giving them far too much credit as sentient autonomous beings - and this is coming from someone who thinks they think in the same way we do! The think like we do, IMO, but they lack a lot of the other subsystems that are necessary for an entity to function in a way that can be considered as autonomous/having free will/desires of its own choosing, etc.
T M 2 Antworten Letzte Antwort

4
A allah@lemm.ee

LOOK MAA I AM ON FRONT PAGE
T This user is from outside of this forum
T This user is from outside of this forum
technocrit@lemmy.dbzer0.com

schrieb am zuletzt editiert von technocrit@lemmy.dbzer0.com

#54

Why would they "prove" something that's completely obvious?

The burden of proof is on the grifters who have overwhelmingly been making false claims and distorting language for decades.
Y M T T 4 Antworten Letzte Antwort

32
T technocrit@lemmy.dbzer0.com

Why would they "prove" something that's completely obvious?

The burden of proof is on the grifters who have overwhelmingly been making false claims and distorting language for decades.
Y This user is from outside of this forum
Y This user is from outside of this forum
yeahiknow3@lemmings.world

schrieb am zuletzt editiert von

#55

They’re just using the terminology that’s widespread in the field. In a sense, the paper’s purpose is to prove that this terminology is unsuitable.
T 1 Antwort Letzte Antwort

19
Y yeahiknow3@lemmings.world

They’re just using the terminology that’s widespread in the field. In a sense, the paper’s purpose is to prove that this terminology is unsuitable.
T This user is from outside of this forum
T This user is from outside of this forum
technocrit@lemmy.dbzer0.com

schrieb am zuletzt editiert von technocrit@lemmy.dbzer0.com

#56

I understand that people in this "field" regularly use pseudo-scientific language (I actually deleted that part of my comment).

But the terminology has never been suitable so it shouldn't be used in the first place. It pre-supposes the hypothesis that they're supposedly "disproving". They're feeding into the grift because that's what the field is. That's how they all get paid the big bucks.
1 Antwort Letzte Antwort

5
K kadup@lemmy.world

Apple is significantly behind and arrived late to the whole AI hype, so of course it's in their absolute best interest to keep showing how LLMs aren't special or amazingly revolutionary.

They're not wrong, but the motivation is also pretty clear.
M This user is from outside of this forum
M This user is from outside of this forum
mcasq_qsacj_234@lemmy.zip

schrieb am zuletzt editiert von

#57

They need to convince investors that this delay wasn't due to incompetence. The problem will only be somewhat effective as long as there isn't an innovation that makes AI more effective.

If that happens, Apple shareholders will, at best, ask the company to increase investment in that area or, at worst, to restructure the company, which could also mean a change in CEO.
1 Antwort Letzte Antwort

13
B brsrklf@jlai.lu

You know, despite not really believing LLM "intelligence" works anywhere like real intelligence, I kind of thought maybe being good at recognizing patterns was a way to emulate it to a point...

But that study seems to prove they're still not even good at that. At first I was wondering how hard the puzzles must have been, and then there's a bit about LLM finishing 100 move towers of Hanoï (on which they were trained) and failing 4 move river crossings. Logically, those problems are very similar... Also, failing to apply a step-by-step solution they were given.
T This user is from outside of this forum
T This user is from outside of this forum
technocrit@lemmy.dbzer0.com

schrieb am zuletzt editiert von technocrit@lemmy.dbzer0.com

#58

Computers are awesome at "recognizing patterns" as long as the pattern is a statistical average of some possibly worthless data set. And it really helps if the computer is setup to ahead of time to recognize pre-determined patterns.
1 Antwort Letzte Antwort

9
J johnedwa@sopuli.xyz

"It's part of the history of the field of artificial intelligence that every time somebody figured out how to make a computer do something—play good checkers, solve simple but relatively informal problems—there was a chorus of critics to say, 'that's not thinking'." -Pamela McCorduck´.
It's called the AI Effect.

As Larry Tesler puts it, "AI is whatever hasn't been done yet.".
T This user is from outside of this forum
T This user is from outside of this forum
technocrit@lemmy.dbzer0.com

schrieb am zuletzt editiert von technocrit@lemmy.dbzer0.com

#59

I'm going to write a program to play tic-tac-toe. If y'all don't think it's "AI", then you're just haters. Nothing will ever be good enough for y'all. You want scientific evidence of intelligence?!?! I can't even define intelligence so take that! \s

Seriously tho. This person is arguing that a checkers program is "AI". It kinda demonstrates the loooong history of this grift.
L J 2 Antworten Letzte Antwort

16
G grimy@lemmy.world

No, it shows how certain people misunderstand the meaning of the word.

You have called npcs in video games "AI" for a decade, yet you were never implying they were somehow intelligent. The whole argument is strangely inconsistent.
T This user is from outside of this forum
T This user is from outside of this forum
technocrit@lemmy.dbzer0.com

schrieb am zuletzt editiert von technocrit@lemmy.dbzer0.com

#60

Who is "you"?

Just because some dummies supposedly think that NPCs are "AI", that doesn't make it so. I don't consider checkers to be a litmus test for "intelligence".
G 1 Antwort Letzte Antwort

4

Anmelden zum Antworten

I

Oracle Inks Cloud Deal Worth $30 Billion a Year
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

13 Stimmen

2 Beiträge

32 Aufrufe

J

And it mentioned nothing...
P

France Pushes Digital ID Check Laws For Platforms Like Reddit and Bluesky
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
12

1

71 Stimmen

12 Beiträge

110 Aufrufe

C

Because that worked so well for South Korea
D

Frequent TikTok users in Taiwan more likely to agree with pro-China narratives, study finds
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
33

1

149 Stimmen

33 Beiträge

285 Aufrufe

B

That’s not the right analogy here. The better analogy would be something like: Your scary mafia-related neighbor shows up with a document saying your house belongs to his land. You said no way, you have connections with someone important that assured you your house is yours only and they’ll help you with another mafia if they want to invade your house. The whole neighborhood gets scared of an upcoming bloodbath that might drag everyone into it. But now your son says he actually agrees that your house belongs to your neighbor, and he’s likely waiting until you’re old enough to possibly give it up to him.
A

Korea’s POSTECH develops sub-millimeter waveguide to shrink AR glasses
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
4

6 Stimmen

4 Beiträge

42 Aufrufe

T

Oh I agree. I just think is part of the equation perhaps the thinner and lighter will enable for better processor? Not an AR guy , although I lived my oculus until FB got hold of it. Didn't use it ever again after that day.
H

Sunsetting the Ghostery Private Browser
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
8

1

33 Stimmen

8 Beiträge

69 Aufrufe

P

Sunsetting Dawn? Of course
P

Silicon Valley cities hit with request for residents' emails to train AI
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
12

1

124 Stimmen

12 Beiträge

107 Aufrufe

T

Premium supported. You get plenty with the free tier, but you get lots more with paid.
P

The Collapse of GPT: Will future artificial intelligence systems perform increasingly poorly due to AI-generated material in their training data?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

19 Stimmen

7 Beiträge

60 Aufrufe

A

Fantastic! Me and my 7 legs tank you so much!
O

Someone left this review on my free Reddit client rdx for Reddit
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

2

0 Stimmen

2 Beiträge

31 Aufrufe

F

I’m going to give you a five star review to balance it.