linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

Technology

203 Beiträge 136 Kommentatoren 5 Aufrufe

P pamasich@kbin.earth

There are custom GPTs which claim to play at a stockfish level or be literally stockfish under the hood (I assume the former is still the latter just not explicitly). Haven't tested them, but if they work, I'd say yes. An LLM itself will never be able to play chess or do anything similar, unless they outsource that task to another tool that can. And there seem to be GPTs that do exactly that.

As for why we need ChatGPT then when the result comes from Stockfish anyway, it's for the natural language prompts and responses.
N This user is from outside of this forum
N This user is from outside of this forum
natenate60@lemmy.world

schrieb zuletzt editiert von

#161

It's not an LLM, but Stockfish does use AI under the hood and has been since 2020. Stockfish uses a classical alpha-beta search strategy (if I recall correctly) combined with a neural network for smarter pruning.

There are some engines of comparable strength that are primarily neural-network based. lc0 comes to mind. lc0 placed 2nd in the Top Chess Engine Championships in 9 out of the past 10 seasons. By comparison, Stockfish is currently on a 10-season win streak in the TCEC.
1 Antwort Letzte Antwort

0
N nova_ad_vitum@lemmy.ca

Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn't help.

This sort of gets to the heart of LLM-based "AI". That one example to me really shows that there's no actual reasoning happening inside. It's producing answers that statistically look like answers that might be given based on that input.

For some things it even works. But calling this intelligence is dubious at best.
U This user is from outside of this forum
U This user is from outside of this forum
ultraviolet@lemmy.world

schrieb zuletzt editiert von ultraviolet@lemmy.world

#162

Because it doesn't have any understanding of the rules of chess or even an internal model of the game state, it just has the text of chess games in its training data and can reproduce the notation, but nothing to prevent it from making illegal moves, trying to move or capture pieces that don't exist, incorrectly declaring check/checkmate, or any number of nonsensical things.
1 Antwort Letzte Antwort

5
N nova_ad_vitum@lemmy.ca

Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn't help.

This sort of gets to the heart of LLM-based "AI". That one example to me really shows that there's no actual reasoning happening inside. It's producing answers that statistically look like answers that might be given based on that input.

For some things it even works. But calling this intelligence is dubious at best.
I This user is from outside of this forum
I This user is from outside of this forum
interdimensionalmeme@lemmy.ml

schrieb zuletzt editiert von

#163

I think the biggest problem is it's very low ability to "test time adaptability". Even when combined with a reasonning model outputting into its context, the weights do not learn out of the immediate context.

I think the solution might be to train a LoRa overlay on the fly against the weights and run inference with that AND the unmodified weights and then have an overseer model self evaluate and recompose the raw outputs.

Like humans are way better at answering stuff when it's a collaboration of more than one person. I suspect the same is true of LLMs.
N 1 Antwort Letzte Antwort

0
J jj4211@lemmy.world

GPTs which claim to use a stockfish API

Then the actual chess isn't LLM. If you are going stockfish, then the LLM doesn't add anything, stockfish is doing everything.

The whole point is the marketing rage is that LLMs can do all kinds of stuff, doubling down on this with the branding of some approaches as "reasoning" models, which are roughly "similar to 'pre-reasoning', but forcing use of more tokens on disposable intermediate generation steps". With this facet of LLM marketing, the promise would be that the LLM can "reason" itself through a chess game without particular enablement. In practice, people trying to feed in gobs of chess data to an LLM end up with an LLM that doesn't even comply to the rules of the game, let alone provide reasonable competitive responses to an oppone.
P This user is from outside of this forum
P This user is from outside of this forum
pamasich@kbin.earth

schrieb zuletzt editiert von

#164

Then the actual chess isn't LLM.

And neither did the Atari 2600 win against ChatGPT. Whatever game they ran on it did.

That's my point here. The fact that neither Atari 2600 nor ChatGPT are capable of playing chess on their own. They can only do so if you provide them with the necessary tools. Which applies to both of them. Yet only one of them was given those tools here.
J 1 Antwort Letzte Antwort

0
N neilbru@lemmy.world

An LLM is a poor computational/predictive paradigm for playing chess.
S This user is from outside of this forum
S This user is from outside of this forum
sugar_in_your_tea@sh.itjust.works

schrieb zuletzt editiert von sugar_in_your_tea@sh.itjust.works

#165

Yeah, a lot of them hallucinate illegal moves.
1 Antwort Letzte Antwort

3
N This user is from outside of this forum
N This user is from outside of this forum
nutwrench@lemmy.ml

schrieb zuletzt editiert von

#166

The Atari 2600 is just hardware. The software came on plug-in cartridges. Video Chess was released for it in 1979.
1 Antwort Letzte Antwort

1
I iavicenna@lemmy.world

well so much hype has been generated around chatgpt being close to AGI that now it makes sense to ask questions like "can chatgpt prove the Riemann hypothesis"
R This user is from outside of this forum
R This user is from outside of this forum
red@sopuli.xyz

schrieb zuletzt editiert von

#167

Even the models that pretend to be AGI are not. It's been proven.
1 Antwort Letzte Antwort

0
P pamasich@kbin.earth

Then the actual chess isn't LLM.

And neither did the Atari 2600 win against ChatGPT. Whatever game they ran on it did.

That's my point here. The fact that neither Atari 2600 nor ChatGPT are capable of playing chess on their own. They can only do so if you provide them with the necessary tools. Which applies to both of them. Yet only one of them was given those tools here.
J This user is from outside of this forum
J This user is from outside of this forum
jj4211@lemmy.world

schrieb zuletzt editiert von

#168

Fine, a chess engine that is capable of running with affordable even for the time 1970s electronics will best what marketing folks would have you think is an arbitrarily capable "reasoning" model running on top of the line 2025 hardware.

You can split hairs about "well actually, the 2600 is hardware and a chess engine is the software" but everyone gets the point.

As to assertions that no one should expect an LLM to be a chess engine, well tell that to the industry that is asserting the LLMs are now "reasoning" and provides a basis to replace most of the labor pool. We need stories like this to calibrate expectations in a way common people can understand..
1 Antwort Letzte Antwort

1
H halosheep@lemm.ee

I swear every single article critical of current LLMs is like, "The square got BLASTED by the triangle shape when it completely FAILED to go through the triangle shaped hole."
L This user is from outside of this forum
L This user is from outside of this forum
lambalicious@lemmy.sdf.org

schrieb zuletzt editiert von

#169

Well, the first and obvious thing to do to show that AI is bad is to show that AI is bad. If it provides that much of a low-hanging fruit for the demonstration... that just further emphasizes the point.
1 Antwort Letzte Antwort

0
N nednobbins@lemm.ee

Sometimes it seems like most of these AI articles are written by AIs with bad prompts.

Human journalists would hopefully do a little research. A quick search would reveal that researches have been publishing about this for over a year so there's no need to sensationalize it. Perhaps the human journalist could have spent a little time talking about why LLMs are bad at chess and how researchers are approaching the problem.

LLMs on the other hand, are very good at producing clickbait articles with low information content.
L This user is from outside of this forum
L This user is from outside of this forum
lovablesidekick@lemmy.world

schrieb zuletzt editiert von lovablesidekick@lemmy.world

#170

In this case it's not even bad prompts, it's a problem domain ChatGPT wasn't designed to be good at. It's like saying modern medicine is clearly bullshit because a doctor loses a basketball game.
N 1 Antwort Letzte Antwort

5
J jj4211@lemmy.world

To be fair, a decent chunk of coding is stupid boilerplate/minutia that varies environment to environment, language to language, library to library.

So LLM can do some code completion, filling out a bunch of boilerplate that is blatantly obvious, generating the redundant text mandated by certain patterns, and keeping straight details between languages like "does this language want join as a method on a list with a string argument, or vice versa?"

Problem is this can be sometimes more annoying than it's worth, as miscompletions are annoying.
L This user is from outside of this forum
L This user is from outside of this forum
lambalicious@lemmy.sdf.org

schrieb zuletzt editiert von

#171

a decent chunk of coding is stupid boilerplate/minutia that varies

...according to a logic, which means LLMs are bad at it.
J 1 Antwort Letzte Antwort

0
L lifecoach5000@lemmy.world

This post did not contain any content.
C This user is from outside of this forum
C This user is from outside of this forum
cley_faye@lemmy.world

schrieb zuletzt editiert von

#172

Ah, you used logic. That's the issue. They don't do that.
1 Antwort Letzte Antwort

23
L lambalicious@lemmy.sdf.org

a decent chunk of coding is stupid boilerplate/minutia that varies

...according to a logic, which means LLMs are bad at it.
J This user is from outside of this forum
J This user is from outside of this forum
jj4211@lemmy.world

schrieb zuletzt editiert von

#173

I'd say that those details that vary tend not to vary within a language and ecosystem, so a fairly dumb correlative relationship is enough to generally be fine. There's no way to use logic to infer that it's obvious that in language X you need to do mylist.join(string) but in language Y you need to do string.join(mylist), but it's super easy to recognize tokens that suggest those things and a correlation to the vocabulary that matches the context.

Rinse and repeat for things like do I need to specify type and what is the vocabulary for the best type for a numeric value, This variable that makes sense is missing a declaration, does this look to actually be a new distinct variable or just a typo of one that was declared.

But again, I'm thinking mostly in what kind of sort of can work, my experience personally is that it's wrong so often as to be annoying and get in the way of more traditional completion behaviors that play it safe, though with less help particularly for languages like python or javascript.
1 Antwort Letzte Antwort

2
A anunusualrelic@lemmy.world

Parrots are actually intelligent though.
E This user is from outside of this forum
E This user is from outside of this forum
etterra@discuss.online

schrieb zuletzt editiert von

#174

Yeah, but not when it comes to understanding human speech. There's a reason that repeating words without really understanding them is called parroting. Gray parrots are the smartest and some can actually understand language a little bit, making them smarter than chat, which is just high tech guessing without comprehension
1 Antwort Letzte Antwort

0
N neilbru@lemmy.world

An LLM is a poor computational/predictive paradigm for playing chess.
S This user is from outside of this forum
S This user is from outside of this forum
surph_ninja@lemmy.world

schrieb zuletzt editiert von

#175

This just in: a hammer makes a poor screwdriver.
W 1 Antwort Letzte Antwort

28
S surph_ninja@lemmy.world

This just in: a hammer makes a poor screwdriver.
W This user is from outside of this forum
W This user is from outside of this forum
whyjiffie@sh.itjust.works

schrieb zuletzt editiert von

#176

LLMs are more like a leaf blower though
1 Antwort Letzte Antwort

9
T takapapatapaka@lemmy.world

Actually, a very specific model (chatgpt3.5-turbo-instruct) was pretty good at chess (around 1700 elo if i remember correctly).
N This user is from outside of this forum
N This user is from outside of this forum
neilbru@lemmy.world

schrieb zuletzt editiert von

#177

I'm impressed, if that's true! In general, an LLM's training cost vs. an LSTM, RNN, or some other more appropriate DNN algorithm suitable for the ruleset is laughably high.
T 1 Antwort Letzte Antwort

3
L lifecoach5000@lemmy.world

This post did not contain any content.
S This user is from outside of this forum
S This user is from outside of this forum
sidhean@lemmy.blahaj.zone

schrieb zuletzt editiert von

#178

Can i fistfight ChatGPT next? I bet I could kick its ass, too
1 Antwort Letzte Antwort

9
I interdimensionalmeme@lemmy.ml

I think the biggest problem is it's very low ability to "test time adaptability". Even when combined with a reasonning model outputting into its context, the weights do not learn out of the immediate context.

I think the solution might be to train a LoRa overlay on the fly against the weights and run inference with that AND the unmodified weights and then have an overseer model self evaluate and recompose the raw outputs.

Like humans are way better at answering stuff when it's a collaboration of more than one person. I suspect the same is true of LLMs.
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemm.ee

schrieb zuletzt editiert von

#179

Like humans are way better at answering stuff when it’s a collaboration of more than one person. I suspect the same is true of LLMs.

It is.

It's really common for non-language implementations of neural networks. If you have an NN that's right some percentage of the time, you can often run it through a bunch of copies of the NNs and take the average and that average is correct a higher percentage of the time.

Aider is an open source AI coding assistant that lets you use one model to plan the coding and a second one to do the actual coding. It works better than doing it in a single pass, even if you assign the the same model to planing and coding.
1 Antwort Letzte Antwort

0
N nova_ad_vitum@lemmy.ca

Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn't help.

This sort of gets to the heart of LLM-based "AI". That one example to me really shows that there's no actual reasoning happening inside. It's producing answers that statistically look like answers that might be given based on that input.

For some things it even works. But calling this intelligence is dubious at best.
J This user is from outside of this forum
J This user is from outside of this forum
jacksonlamb@lemmy.world

schrieb zuletzt editiert von

#180

ChatGPT versus Deepseek is hilarious. They both cheat like crazy and then one side jedi mind tricks the winner into losing.
S 1 Antwort Letzte Antwort

2

Anmelden zum Antworten

R

Diego
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

0 Aufrufe

Niemand hat geantwortet
J

[2025 Special Competitive Studies Project AI Expo in DC] Max Blumenthal attends AI warlord conference [13:07 | JUN 08 2025 | The Grayzone]
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

4 Stimmen

1 Beiträge

0 Aufrufe

Niemand hat geantwortet
D

Apple acquires RAC7, its first-ever video game studio
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
16

1

67 Stimmen

16 Beiträge

2 Aufrufe

E

I'm not questioning whether or not the game is good, just wondering why Apple would want to limit their customer base so much.
M

autofocus glasses
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
53

1

126 Stimmen

53 Beiträge

9 Aufrufe

M

Hm. Checking my glasses I think there is something on the top too. I can see distance ever so slightly clearer looking out the top. If I remember right, I have a minus .25 in one eye. Always been told it didn't need correction, but maybe it is in this pair. I should go get some off the shelf progressive readers and try those.
X

Generative AI's most prominent skeptic doubles down
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
14

1

43 Stimmen

14 Beiträge

2 Aufrufe

Z

I don't think so, and I believe not even the current technology used for neural network simulations will bring us to AGI, yet alone LLMs.
P

Duolingo CEO tries to walk back AI-first comments, fails
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
134

758 Stimmen

134 Beiträge

4 Aufrufe

K

I think on iOS they added a thing where it would change based on the days you didn't use Duolingo. Honestly at this point I think it speaks more about the sorry state of their company more than anything.
P

Disappointed in Plebbit : I Really Believed in the Vision, But It Was All Just Talk
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
6

1

13 Stimmen

6 Beiträge

4 Aufrufe

R

Protocol implementation plebbit-js is separated from client like Seedit
C

FBI nabs worker at DVD company for ripping prerelease blockbusters
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

1 Aufrufe

Niemand hat geantwortet