linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

Technology

203 Beiträge 136 Kommentatoren 5 Aufrufe

I interdimensionalmeme@lemmy.ml

I think the biggest problem is it's very low ability to "test time adaptability". Even when combined with a reasonning model outputting into its context, the weights do not learn out of the immediate context.

I think the solution might be to train a LoRa overlay on the fly against the weights and run inference with that AND the unmodified weights and then have an overseer model self evaluate and recompose the raw outputs.

Like humans are way better at answering stuff when it's a collaboration of more than one person. I suspect the same is true of LLMs.
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemm.ee

schrieb zuletzt editiert von

#179

Like humans are way better at answering stuff when it’s a collaboration of more than one person. I suspect the same is true of LLMs.

It is.

It's really common for non-language implementations of neural networks. If you have an NN that's right some percentage of the time, you can often run it through a bunch of copies of the NNs and take the average and that average is correct a higher percentage of the time.

Aider is an open source AI coding assistant that lets you use one model to plan the coding and a second one to do the actual coding. It works better than doing it in a single pass, even if you assign the the same model to planing and coding.
1 Antwort Letzte Antwort

0
N nova_ad_vitum@lemmy.ca

Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn't help.

This sort of gets to the heart of LLM-based "AI". That one example to me really shows that there's no actual reasoning happening inside. It's producing answers that statistically look like answers that might be given based on that input.

For some things it even works. But calling this intelligence is dubious at best.
J This user is from outside of this forum
J This user is from outside of this forum
jacksonlamb@lemmy.world

schrieb zuletzt editiert von

#180

ChatGPT versus Deepseek is hilarious. They both cheat like crazy and then one side jedi mind tricks the winner into losing.
S 1 Antwort Letzte Antwort

2
L lovablesidekick@lemmy.world

In this case it's not even bad prompts, it's a problem domain ChatGPT wasn't designed to be good at. It's like saying modern medicine is clearly bullshit because a doctor loses a basketball game.
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemm.ee

schrieb zuletzt editiert von

#181

I imagine the "author" did something like, "Search http://google.scholar.com/ find a publication where AI failed at something and write a paragraph about it."

It's not even as bad as the article claims.

Atari isn't great at chess. https://chess.stackexchange.com/questions/24952/how-strong-is-each-level-of-atari-2600s-video-chess
Random LLMs were nearly as good 2 years ago. https://lmsys.org/blog/2023-05-03-arena/
LLMs that are actually trained for chess have done much better. https://arxiv.org/abs/2501.17186
L 1 Antwort Letzte Antwort

1
J jacksonlamb@lemmy.world

ChatGPT versus Deepseek is hilarious. They both cheat like crazy and then one side jedi mind tricks the winner into losing.
S This user is from outside of this forum
S This user is from outside of this forum
schadrach@lemmy.sdf.org

schrieb zuletzt editiert von

#182

So they are both masters of troll chess then?

See: King of the Bridge
1 Antwort Letzte Antwort

1
P pushbutton@lemmy.world

And yet everybody is selling to write code.

The last time I checked, coding was requiring logic.
S This user is from outside of this forum
S This user is from outside of this forum
schadrach@lemmy.sdf.org

schrieb zuletzt editiert von

#183

A lot of writing code is relatively standard patterns and variations on them. For most but the really interesting parts, you could probably write a sufficiently detailed description and get an LLM to produce functional code that does the thing.

Basically for a bunch of common structures and use cases, the logic already exists and is well known and replicated by enough people in enough places in enough languages that an LLM can replicate it well enough, like literally anyone else who has ever written anything in that language.
1 Antwort Letzte Antwort

0
N neilbru@lemmy.world

I'm impressed, if that's true! In general, an LLM's training cost vs. an LSTM, RNN, or some other more appropriate DNN algorithm suitable for the ruleset is laughably high.
T This user is from outside of this forum
T This user is from outside of this forum
takapapatapaka@lemmy.world

schrieb zuletzt editiert von

#184

Oh yes, cost of training are ofc a great loss here, it's not optimized at all, and it's stuck at an average level.

Interestingly, i believe some people did research on it and found some parameters in the model that seemed to represent the state of the chess board (as in, they seem to reflect the current state of the board, and when artificially modified, the model takes modification into account in its playing). It was used by a french youtuber to show how LLMs can somehow have a kinda representation of the world. I can try to get the sources back if you're interested.
N 1 Antwort Letzte Antwort

1
T takapapatapaka@lemmy.world

Oh yes, cost of training are ofc a great loss here, it's not optimized at all, and it's stuck at an average level.

Interestingly, i believe some people did research on it and found some parameters in the model that seemed to represent the state of the chess board (as in, they seem to reflect the current state of the board, and when artificially modified, the model takes modification into account in its playing). It was used by a french youtuber to show how LLMs can somehow have a kinda representation of the world. I can try to get the sources back if you're interested.
N This user is from outside of this forum
N This user is from outside of this forum
neilbru@lemmy.world

schrieb zuletzt editiert von neilbru@lemmy.world

#185

Absolutely interested. Thank you for your time to share that.

My career path in neural networks began as a researcher for cancerous tissue object detection in medical diagnostic imaging. Now it is switched to generative models for CAD (architecture, product design, game assets, etc.). I don't really mess about with fine-tuning LLMs.

However, I do self-host my own LLMs as code assistants. Thus, I'm only tangentially involved with the current LLM craze.

But it does interest me, nonetheless!
T 1 Antwort Letzte Antwort

0
1 13igtyme@lemmy.world

Don't call my fish stupid.
V This user is from outside of this forum
V This user is from outside of this forum
venator@lemmy.nz

schrieb zuletzt editiert von

#186

Well, can it climb trees?
1 Antwort Letzte Antwort

0
N neilbru@lemmy.world

An LLM is a poor computational/predictive paradigm for playing chess.
B This user is from outside of this forum
B This user is from outside of this forum
bleys@lemmy.world

schrieb zuletzt editiert von

#187

The underlying neural network tech is the same as what the best chess AIs (AlphaZero, Leela) use. The problem is, as you said, that ChatGPT is designed specifically as an LLM so it’s been optimized strictly to write semi-coherent text first, and then any problem solving beyond that is ancillary. Which should say a lot about how inconsistent ChatGPT is at solving problems, given that it’s not actually optimized for any specific use cases.
N 1 Antwort Letzte Antwort

5
N nednobbins@lemm.ee

I imagine the "author" did something like, "Search http://google.scholar.com/ find a publication where AI failed at something and write a paragraph about it."

It's not even as bad as the article claims.

Atari isn't great at chess. https://chess.stackexchange.com/questions/24952/how-strong-is-each-level-of-atari-2600s-video-chess
Random LLMs were nearly as good 2 years ago. https://lmsys.org/blog/2023-05-03-arena/
LLMs that are actually trained for chess have done much better. https://arxiv.org/abs/2501.17186
L This user is from outside of this forum
L This user is from outside of this forum
lovablesidekick@lemmy.world

schrieb zuletzt editiert von

#188

Wouldn't surprise me if an LLM trained on records of chess moves made good chess moves. I just wouldn't expect the deployed version of ChatGPT to generate coherent chess moves based on the general text it's been trained on.
N 1 Antwort Letzte Antwort

0
B bleys@lemmy.world

The underlying neural network tech is the same as what the best chess AIs (AlphaZero, Leela) use. The problem is, as you said, that ChatGPT is designed specifically as an LLM so it’s been optimized strictly to write semi-coherent text first, and then any problem solving beyond that is ancillary. Which should say a lot about how inconsistent ChatGPT is at solving problems, given that it’s not actually optimized for any specific use cases.
N This user is from outside of this forum
N This user is from outside of this forum
neilbru@lemmy.world

schrieb zuletzt editiert von neilbru@lemmy.world

#189

Yes, I agree wholeheartedly with your clarification.

My career path, as I stated in a different comment in regards to neural networks, is focused on generative DNNs for CAD applications and parametric 3D modeling. Before that, I began as a researcher in cancerous tissue classification and object detection in medical diagnostic imaging.

Thus, large language models are well out of my area of expertise in terms of the architecture of their models.

However, fundamentally it boils down to the fact that the specific large language model used was designed to predict text and not necessarily solve problems/play games to "win"/"survive".

(I admit that I'm just parroting what you stated and maybe rehashing what I stated even before that, but I like repeating and refining in simple terms to practice explaining to laymen and, dare I say, clients. It helps me feel as if I don't come off too pompously when talking about this subject to others; forgive my tedium.)
1 Antwort Letzte Antwort

2
L lifecoach5000@lemmy.world

This post did not contain any content.
K This user is from outside of this forum
K This user is from outside of this forum
korhaka@sopuli.xyz

schrieb zuletzt editiert von

#190

Is anyone actually surprised at that?
1 Antwort Letzte Antwort

1
L lifecoach5000@lemmy.world

This post did not contain any content.
J This user is from outside of this forum
J This user is from outside of this forum
jsomae@lemmy.ml

schrieb zuletzt editiert von

#191

Using an LLM as a chess engine is like using a power tool as a table leg. Pretty funny honestly, but it's obviously not going to be good at it, at least not without scaffolding.
K 1 Antwort Letzte Antwort

12
L lovablesidekick@lemmy.world

Wouldn't surprise me if an LLM trained on records of chess moves made good chess moves. I just wouldn't expect the deployed version of ChatGPT to generate coherent chess moves based on the general text it's been trained on.
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemm.ee

schrieb zuletzt editiert von

#192

I wouldn't either but that's exactly what lmsys.org found.

That blog post had ratings between 858 and 1169. Those are slightly higher than the average rating of human users on popular chess sites. Their latest leaderboard shows them doing even better.

https://lmarena.ai/leaderboard
has one of the Gemini models with a rating of 1470. That's pretty good.
1 Antwort Letzte Antwort

1
N nova_ad_vitum@lemmy.ca

Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn't help.

This sort of gets to the heart of LLM-based "AI". That one example to me really shows that there's no actual reasoning happening inside. It's producing answers that statistically look like answers that might be given based on that input.

For some things it even works. But calling this intelligence is dubious at best.
P This user is from outside of this forum
P This user is from outside of this forum
propitiouspanda@lemmy.cafe

schrieb zuletzt editiert von

#193

It plays okay for a few moves but then the moment it gets in trouble it straight up cheats.

Lol. More comparisons to how AI is currently like a young child.
1 Antwort Letzte Antwort

0
N neilbru@lemmy.world

Absolutely interested. Thank you for your time to share that.

My career path in neural networks began as a researcher for cancerous tissue object detection in medical diagnostic imaging. Now it is switched to generative models for CAD (architecture, product design, game assets, etc.). I don't really mess about with fine-tuning LLMs.

However, I do self-host my own LLMs as code assistants. Thus, I'm only tangentially involved with the current LLM craze.

But it does interest me, nonetheless!
T This user is from outside of this forum
T This user is from outside of this forum
takapapatapaka@lemmy.world

schrieb zuletzt editiert von

#194

Here is the main blog post that i remembered : it has a follow up, a more scientific version, and uses two other articles as a basis, so you might want to dig around what they mention in the introduction.

It is indeed a quite technical discovery, and it still lacks complete and wider analysis, but it is very interesting for the fact that it kinda invalidates the common gut feeling that llms are pure lucky random.
1 Antwort Letzte Antwort

0
J jsomae@lemmy.ml

Using an LLM as a chess engine is like using a power tool as a table leg. Pretty funny honestly, but it's obviously not going to be good at it, at least not without scaffolding.
K This user is from outside of this forum
K This user is from outside of this forum
kent_eh@lemmy.ca

schrieb zuletzt editiert von

#195

is like using a power tool as a table leg.

Then again, our corporate lords and masters are trying to replace all manner of skilled workers with those same LLM "AI" tools.

And clearly that will backfire on them and they'll eventually scramble to find people with the needed skills, but in the meantime tons of people will have lost their source of income.
J 1 Antwort Letzte Antwort

2
L lifecoach5000@lemmy.world

This post did not contain any content.
F This user is from outside of this forum
F This user is from outside of this forum
fourwaveforms@lemm.ee

schrieb zuletzt editiert von

#196

If you don't play chess, the Atari is probably going to beat you as well.

LLMs are only good at things to the extent that they have been well-trained in the relevant areas. Not just learning to predict text string sequences, but reinforcement learning after that, where a human or some other agent says "this answer is better than that one" enough times in enough of the right contexts. It mimics the way humans learn, which is through repeated and diverse exposure.

If they set up a system to train it against some chess program, or (much simpler) simply gave it a tool call, it would do much better. Tool calling already exists and would be by far the easiest way.

It could also be instructed to write a chess solver program and then run it, at which point it would be on par with the Atari, but it wouldn't compete well with a serious chess solver.
1 Antwort Letzte Antwort

4
K kent_eh@lemmy.ca

is like using a power tool as a table leg.

Then again, our corporate lords and masters are trying to replace all manner of skilled workers with those same LLM "AI" tools.

And clearly that will backfire on them and they'll eventually scramble to find people with the needed skills, but in the meantime tons of people will have lost their source of income.
J This user is from outside of this forum
J This user is from outside of this forum
jsomae@lemmy.ml

schrieb zuletzt editiert von jsomae@lemmy.ml

#197

If you believe LLMs are not good at anything then there should be relatively little to worry about in the long-term, but I am more concerned.

It's not obvious to me that it will backfire for them, because I believe LLMs are good at some things (that is, when they are used correctly, for the correct tasks). Currently they're being applied to far more use cases than they are likely to be good at -- either because they're overhyped or our corporate lords and masters are just experimenting to find out what they're good at and what not. Some of these cases will be like chess, but others will be like code*.

(* not saying LLMs are good at code in general, but for some coding applications I believe they are vastly more efficient than humans, even if a human expert can currently write higher-quality less-buggy code.)
K 1 Antwort Letzte Antwort

0
A arc99@lemmy.world

Hardly surprising. Llms aren't -thinking- they're just shitting out the next token for any given input of tokens.
S This user is from outside of this forum
S This user is from outside of this forum
stevedice@sh.itjust.works

schrieb zuletzt editiert von

#198

That's exactly what thinking is, though.
A 1 Antwort Letzte Antwort

0

Anmelden zum Antworten

P

Telegram, the FSB, and the Man in the Middle: The technical infrastructure that underpins Telegram is controlled by a man whose companies have collaborated with Russian intelligence services.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
12

1

157 Stimmen

12 Beiträge

0 Aufrufe

W

that's not just useless defeatism, but also false. effective end to end encryption exists in multiple forms today. signal, maybe even with a custom server. matrix if the server is being ran on trusted hardware. XMPP too with the right extensions.
A

US government is using AI for unprecedented social media surveillance
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
18

1

185 Stimmen

18 Beiträge

2 Aufrufe

N

Part of the reason for my use of "might".
P

Business Insider is tracking employees’ ChatGPT usage as part of a new AI push: An enterprise version of ChatGPT is now available to all staff, with 70% using the tool “regularly.”
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

1 Stimmen

3 Beiträge

0 Aufrufe

B

They’re trash because the entire rag is right-wing billionaire propaganda by design.
P

Mozilla is shutting down Pocket, their read-it-later and content discovery app, and Fakespot, their browser extension that analyzes the authenticity of online product reviews.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
12

1

3 Stimmen

12 Beiträge

2 Aufrufe

G

Yeah, I don’t know how they’re doing it. They’re using some “zero trust” system. It’s beyond me.
K

The U.S. Just Ran a Solar Storm Emergency Drill. The Real Deal Would Be a Catastrophe
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
19

1

149 Stimmen

19 Beiträge

6 Aufrufe

C

Got it, at that point (extremely high voltage) you'd need suppression at the panel. Which I would hope people have inline, but not expect like an LVD.
H

Apple Fights Back Against Ruling Requiring External Payment Options
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

1

12 Stimmen

7 Beiträge

5 Aufrufe

C

Sure, he wasn't an engineer, so no, Jobs never personally "invented" anything. But Jobs at least knew what was good and what was shit when he saw it. Under Tim Cook, Apple just keeps putting out shitty unimaginative products, Cook is allowing Apple to stagnate, a dangerous thing to do when they have under 10% market share.
D

Rebecca Shaw: I knew one day I’d have to watch powerful men burn the world down. But I didn't expect them to be such losers.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1

17 Stimmen

2 Beiträge

6 Aufrufe

J

This is why they are businessmen and not politicians or influencers
V

China aims to recruit top US scientists as Trump tries to kill the CHIPS Act
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

1 Aufrufe

Niemand hat geantwortet