linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

Technology

204 Beiträge 136 Kommentatoren 5.9k Aufrufe

N neilbru@lemmy.world

An LLM is a poor computational/predictive paradigm for playing chess.
T This user is from outside of this forum
T This user is from outside of this forum
takapapatapaka@lemmy.world

schrieb am zuletzt editiert von

#154

Actually, a very specific model (chatgpt3.5-turbo-instruct) was pretty good at chess (around 1700 elo if i remember correctly).
N 1 Antwort Letzte Antwort

11
P pushbutton@lemmy.world

Fair point.

I liked the "upgraded autocompletion", you know, an completion based on the context, just before the time that they pushed it too much with 20 lines of non sense...

Now I am thinking of a way of doing the thing, then I receive a 20 lines suggestion.

So I am checking if that make sense, losing my momentum, only to realize the suggestion us calling shit that don't exist...

Screw that.
M This user is from outside of this forum
M This user is from outside of this forum
merdaverse@lemm.ee

schrieb am zuletzt editiert von

#155

The amount of garbage it spits out in autocomplete is distracting. If it's constantly making me 5-10% less productive the many times it's wrong, it should save me a lot of time when it is right, and generally, I haven't found it able to do that.

Yesterday I tried to prompt it to change around 20 call sites for a function where I had changed the signature. Easy, boring and repetitive, something that a junior could easily do. And all the models were absolutely clueless about it (using copilot)
1 Antwort Letzte Antwort

1
F fmt99@lemmy.world

Did the author thinks ChatGPT is in fact an AGI? It's a chatbot. Why would it be good at chess? It's like saying an Atari 2600 running a dedicated chess program can beat Google Maps at chess.
M This user is from outside of this forum
M This user is from outside of this forum
merdaverse@lemm.ee

schrieb am zuletzt editiert von merdaverse@lemm.ee

#156

OpenAI has been talking about AGI for years, implying that they are getting closer to it with their products.

(openai.com)

(openai.com)

Not to even mention all the hype created by the techbros around it.
F 1 Antwort Letzte Antwort

2
A arc99@lemmy.world

All AIs are the same. They're just scraping content from GitHub, stackoverflow etc with a bunch of guardrails slapped on to spew out sentences that conform to their training data but there is no intelligence. They're super handy for basic code snippets but anyone using them anything remotely complex or nuanced will regret it.
A This user is from outside of this forum
A This user is from outside of this forum
alecsadler@sh.itjust.works

schrieb am zuletzt editiert von

#157

I've used agents for implementing entire APIs and front-ends from the ground up with my own customizations and nuances.

I will say that, for my pedantic needs, it typically only gets about 80-90% of the way there so I still have to put fingers to code, but it definitely saves a boat load of time in those instances.
1 Antwort Letzte Antwort

0
M merdaverse@lemm.ee

OpenAI has been talking about AGI for years, implying that they are getting closer to it with their products.

(openai.com)

(openai.com)

Not to even mention all the hype created by the techbros around it.
F This user is from outside of this forum
F This user is from outside of this forum
fmt99@lemmy.world

schrieb am zuletzt editiert von

#158

Hey I didn't say anywhere that corporations don't lie to promote their product did I?
1 Antwort Letzte Antwort

0
A arc99@lemmy.world

All AIs are the same. They're just scraping content from GitHub, stackoverflow etc with a bunch of guardrails slapped on to spew out sentences that conform to their training data but there is no intelligence. They're super handy for basic code snippets but anyone using them anything remotely complex or nuanced will regret it.
N This user is from outside of this forum
N This user is from outside of this forum
natenate60@lemmy.world

schrieb am zuletzt editiert von

#159
One of my mates generated an entire website using Gemini. It was a React web app that tracks inventory for trading card dealers. It actually did come out functional and well-polished. That being said, the AI really struggled with several aspects of the project that humans would not:
- It left database secrets in the code
- The design of the website meant that it was impossible to operate securely
- The quality of the code itself was hot garbage—unreadable and undocumented nonsense that somehow still worked
- It did not break the code into multiple files. It piled everything into a single file
1 Antwort Letzte Antwort

0
E empricorn@feddit.nl

You're not wrong, but keep in mind ChatGPT advocates, including the company itself are referring to it as AI, including in marketing. They're saying it's a complete, self-learning, constantly-evolving Artificial Intelligence that has been improving itself since release... And it loses to a 4KB video game program from 1979 that can only "think" 2 moves ahead.
F This user is from outside of this forum
F This user is from outside of this forum
fmt99@lemmy.world

schrieb am zuletzt editiert von

#160

That's totally fair, the company is obviously lying, excuse me "marketing", to promote their product, that's absolutely true.
1 Antwort Letzte Antwort

1
P pamasich@kbin.earth

There are custom GPTs which claim to play at a stockfish level or be literally stockfish under the hood (I assume the former is still the latter just not explicitly). Haven't tested them, but if they work, I'd say yes. An LLM itself will never be able to play chess or do anything similar, unless they outsource that task to another tool that can. And there seem to be GPTs that do exactly that.

As for why we need ChatGPT then when the result comes from Stockfish anyway, it's for the natural language prompts and responses.
N This user is from outside of this forum
N This user is from outside of this forum
natenate60@lemmy.world

schrieb am zuletzt editiert von

#161

It's not an LLM, but Stockfish does use AI under the hood and has been since 2020. Stockfish uses a classical alpha-beta search strategy (if I recall correctly) combined with a neural network for smarter pruning.

There are some engines of comparable strength that are primarily neural-network based. lc0 comes to mind. lc0 placed 2nd in the Top Chess Engine Championships in 9 out of the past 10 seasons. By comparison, Stockfish is currently on a 10-season win streak in the TCEC.
1 Antwort Letzte Antwort

0
N nova_ad_vitum@lemmy.ca

Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn't help.

This sort of gets to the heart of LLM-based "AI". That one example to me really shows that there's no actual reasoning happening inside. It's producing answers that statistically look like answers that might be given based on that input.

For some things it even works. But calling this intelligence is dubious at best.
U This user is from outside of this forum
U This user is from outside of this forum
ultraviolet@lemmy.world

schrieb am zuletzt editiert von ultraviolet@lemmy.world

#162

Because it doesn't have any understanding of the rules of chess or even an internal model of the game state, it just has the text of chess games in its training data and can reproduce the notation, but nothing to prevent it from making illegal moves, trying to move or capture pieces that don't exist, incorrectly declaring check/checkmate, or any number of nonsensical things.
1 Antwort Letzte Antwort

5
N nova_ad_vitum@lemmy.ca

Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn't help.

This sort of gets to the heart of LLM-based "AI". That one example to me really shows that there's no actual reasoning happening inside. It's producing answers that statistically look like answers that might be given based on that input.

For some things it even works. But calling this intelligence is dubious at best.
I This user is from outside of this forum
I This user is from outside of this forum
interdimensionalmeme@lemmy.ml

schrieb am zuletzt editiert von

#163

I think the biggest problem is it's very low ability to "test time adaptability". Even when combined with a reasonning model outputting into its context, the weights do not learn out of the immediate context.

I think the solution might be to train a LoRa overlay on the fly against the weights and run inference with that AND the unmodified weights and then have an overseer model self evaluate and recompose the raw outputs.

Like humans are way better at answering stuff when it's a collaboration of more than one person. I suspect the same is true of LLMs.
N 1 Antwort Letzte Antwort

0
J jj4211@lemmy.world

GPTs which claim to use a stockfish API

Then the actual chess isn't LLM. If you are going stockfish, then the LLM doesn't add anything, stockfish is doing everything.

The whole point is the marketing rage is that LLMs can do all kinds of stuff, doubling down on this with the branding of some approaches as "reasoning" models, which are roughly "similar to 'pre-reasoning', but forcing use of more tokens on disposable intermediate generation steps". With this facet of LLM marketing, the promise would be that the LLM can "reason" itself through a chess game without particular enablement. In practice, people trying to feed in gobs of chess data to an LLM end up with an LLM that doesn't even comply to the rules of the game, let alone provide reasonable competitive responses to an oppone.
P This user is from outside of this forum
P This user is from outside of this forum
pamasich@kbin.earth

schrieb am zuletzt editiert von

#164

Then the actual chess isn't LLM.

And neither did the Atari 2600 win against ChatGPT. Whatever game they ran on it did.

That's my point here. The fact that neither Atari 2600 nor ChatGPT are capable of playing chess on their own. They can only do so if you provide them with the necessary tools. Which applies to both of them. Yet only one of them was given those tools here.
J 1 Antwort Letzte Antwort

0
N neilbru@lemmy.world

An LLM is a poor computational/predictive paradigm for playing chess.
S This user is from outside of this forum
S This user is from outside of this forum
sugar_in_your_tea@sh.itjust.works

schrieb am zuletzt editiert von sugar_in_your_tea@sh.itjust.works

#165

Yeah, a lot of them hallucinate illegal moves.
1 Antwort Letzte Antwort

3
N This user is from outside of this forum
N This user is from outside of this forum
nutwrench@lemmy.ml

schrieb am zuletzt editiert von

#166

The Atari 2600 is just hardware. The software came on plug-in cartridges. Video Chess was released for it in 1979.
1 Antwort Letzte Antwort

1
I iavicenna@lemmy.world

well so much hype has been generated around chatgpt being close to AGI that now it makes sense to ask questions like "can chatgpt prove the Riemann hypothesis"
R This user is from outside of this forum
R This user is from outside of this forum
red@sopuli.xyz

schrieb am zuletzt editiert von

#167

Even the models that pretend to be AGI are not. It's been proven.
1 Antwort Letzte Antwort

0
P pamasich@kbin.earth

Then the actual chess isn't LLM.

And neither did the Atari 2600 win against ChatGPT. Whatever game they ran on it did.

That's my point here. The fact that neither Atari 2600 nor ChatGPT are capable of playing chess on their own. They can only do so if you provide them with the necessary tools. Which applies to both of them. Yet only one of them was given those tools here.
J This user is from outside of this forum
J This user is from outside of this forum
jj4211@lemmy.world

schrieb am zuletzt editiert von

#168

Fine, a chess engine that is capable of running with affordable even for the time 1970s electronics will best what marketing folks would have you think is an arbitrarily capable "reasoning" model running on top of the line 2025 hardware.

You can split hairs about "well actually, the 2600 is hardware and a chess engine is the software" but everyone gets the point.

As to assertions that no one should expect an LLM to be a chess engine, well tell that to the industry that is asserting the LLMs are now "reasoning" and provides a basis to replace most of the labor pool. We need stories like this to calibrate expectations in a way common people can understand..
1 Antwort Letzte Antwort

1
H halosheep@lemm.ee

I swear every single article critical of current LLMs is like, "The square got BLASTED by the triangle shape when it completely FAILED to go through the triangle shaped hole."
L This user is from outside of this forum
L This user is from outside of this forum
lambalicious@lemmy.sdf.org

schrieb am zuletzt editiert von

#169

Well, the first and obvious thing to do to show that AI is bad is to show that AI is bad. If it provides that much of a low-hanging fruit for the demonstration... that just further emphasizes the point.
1 Antwort Letzte Antwort

0
N nednobbins@lemm.ee

Sometimes it seems like most of these AI articles are written by AIs with bad prompts.

Human journalists would hopefully do a little research. A quick search would reveal that researches have been publishing about this for over a year so there's no need to sensationalize it. Perhaps the human journalist could have spent a little time talking about why LLMs are bad at chess and how researchers are approaching the problem.

LLMs on the other hand, are very good at producing clickbait articles with low information content.
L This user is from outside of this forum
L This user is from outside of this forum
lovablesidekick@lemmy.world

schrieb am zuletzt editiert von lovablesidekick@lemmy.world

#170

In this case it's not even bad prompts, it's a problem domain ChatGPT wasn't designed to be good at. It's like saying modern medicine is clearly bullshit because a doctor loses a basketball game.
N 1 Antwort Letzte Antwort

5
J jj4211@lemmy.world

To be fair, a decent chunk of coding is stupid boilerplate/minutia that varies environment to environment, language to language, library to library.

So LLM can do some code completion, filling out a bunch of boilerplate that is blatantly obvious, generating the redundant text mandated by certain patterns, and keeping straight details between languages like "does this language want join as a method on a list with a string argument, or vice versa?"

Problem is this can be sometimes more annoying than it's worth, as miscompletions are annoying.
L This user is from outside of this forum
L This user is from outside of this forum
lambalicious@lemmy.sdf.org

schrieb am zuletzt editiert von

#171

a decent chunk of coding is stupid boilerplate/minutia that varies

...according to a logic, which means LLMs are bad at it.
J 1 Antwort Letzte Antwort

0
L lifecoach5000@lemmy.world

This post did not contain any content.
C This user is from outside of this forum
C This user is from outside of this forum
cley_faye@lemmy.world

schrieb am zuletzt editiert von

#172

Ah, you used logic. That's the issue. They don't do that.
1 Antwort Letzte Antwort

23
L lambalicious@lemmy.sdf.org

a decent chunk of coding is stupid boilerplate/minutia that varies

...according to a logic, which means LLMs are bad at it.
J This user is from outside of this forum
J This user is from outside of this forum
jj4211@lemmy.world

schrieb am zuletzt editiert von

#173

I'd say that those details that vary tend not to vary within a language and ecosystem, so a fairly dumb correlative relationship is enough to generally be fine. There's no way to use logic to infer that it's obvious that in language X you need to do mylist.join(string) but in language Y you need to do string.join(mylist), but it's super easy to recognize tokens that suggest those things and a correlation to the vocabulary that matches the context.

Rinse and repeat for things like do I need to specify type and what is the vocabulary for the best type for a numeric value, This variable that makes sense is missing a declaration, does this look to actually be a new distinct variable or just a typo of one that was declared.

But again, I'm thinking mostly in what kind of sort of can work, my experience personally is that it's wrong so often as to be annoying and get in the way of more traditional completion behaviors that play it safe, though with less help particularly for languages like python or javascript.
1 Antwort Letzte Antwort

2

Anmelden zum Antworten

P

The age of storage: Batteries primed for India’s power markets
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

30 Stimmen

1 Beiträge

1 Aufrufe

Niemand hat geantwortet
F

Google tool misused to scrub tech CEO’s shady past from search
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
19

1

206 Stimmen

19 Beiträge

15 Aufrufe

G

Ok... Here's something you should know. What happened there was suppressing personal data from Google's search engine. In the EU, that is regarded as a fundamental human right. The "right to be forgotten" is exactly about hiding a shady past. The GDPR gives you the right to demand that Google must omit certain links when people search for your name. Google does comply. You don't need a court order or anything. So, you can't celebrate the GDPR while also condemning what happened here.
C

Microsoft alerts businesses/governments to 0-day attack on SharePoint server software
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
4

75 Stimmen

4 Beiträge

61 Aufrufe

S

Just moved. It sucks. Why is everything so bloated with microsoft and poorly imemented.
O

Create a Professional Logo with AI – Step-by-Step Digital Guide
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

2

0 Stimmen

1 Beiträge

16 Aufrufe

Niemand hat geantwortet
T

Supreme Court to decide whether ISPs must disconnect users accused of piracy
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
179

1

677 Stimmen

179 Beiträge

4k Aufrufe

D

Thats what the firewall rules do too, don't allow internet connection if there's no vpn connection. Firewall is a system-wide solution that always works, while qbt config relies heavily on the application implementing interface binding properly. Which it doesn't fully btw.
P

The Trump Administration is Building a National Citizenship Data System; State and county election officials can now check the citizenship status of their entire voter lists.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
104

390 Stimmen

104 Beiträge

997 Aufrufe

T

I gave you the data, as they say "facts don't care about your feelings."
T

Instacart CEO Fidji Simo is joining OpenAI as CEO of Applications
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1

20 Stimmen

2 Beiträge

29 Aufrufe

P

overseeing product development for Facebook Video So she’s the one who oversaw the misleading Facebook Video numbers that destroyed a whole swath of websites?
J

Do you dislike your dependency on Android? To the rescue comes Mobile Linux "PostmarketOS" - Funded via Donations (link to 2025 Priorities -> Focus on Reliabilty, Audi, Camera, etc)
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
4

0 Stimmen

4 Beiträge

45 Aufrufe

K

Only way I'll want a different phone brand is if it comes with ZERO bloatware and has an excellent internal memory/storage cleanse that has nothing to do with Google's Files or a random app I'm not sure I can trust without paying or rooting. So far my A series phones do what I need mostly and in my opinion is superior to the Motorola's my fiancé prefers minus the phone-phone charge ability his has, everything else I'm just glad I have enough control to tweak things to my liking, however these days Samsungs seem to be infested with Google bloatware and apps that insist on opening themselves back up regardless of the widespread battery restrictions I've assigned (even was sent a "Stop Closing my Apps" notif that sent me to an article ) short of Disabling many unnecessary apps bc fully rooting my devices is something I rarely do anymore. I have a random Chinese brand tablet where I actually have more control over the apps than either of my A series phones whee Force Stopping STAYS that way when I tell them to! I hate being listened to for ads and the unwanted draining my battery life and data (I live off-grid and pay data rates because "Unlimited" is some throttled BS) so my ability to control what's going on in the background matters a lot to me, enough that I'm anti Meta-apps and avoid all non-essential Google apps. I can't afford topline phones and the largest data plan, so I work with what I can afford and I'm sad refurbished A lines seem to be getting more expensive while giving away my control to companies. Last A line I bought that was supposed to be my first 5G phone was network locked, so I got ripped off, but it still serves me well in off-grid life. Only app that actually regularly malfunctions when I Force Stop it's background presence is Roku, which I find to have very an almost insidious presence in our lives. Google Play, Chrome, and Spotify never acts incompetent in any way no matter how I have to open the setting every single time I turn Airplane Mode off. Don't need Gmail with Chrome and DuckDuckGo has been awesome at intercepting self-loading ads. I hope one day DDG gets better bc Google seems to be terrible lately and I even caught their AI contradicting itself when asking about if Homo Florensis is considered Human (yes) and then asked the oldest age of human remains, and was fed the outdated narrative of 300,000 years versus 700,000+ years bipedal pre-humans have been carbon dated outside of the Cradle of Humanity in South Africa. SO sorry to go off-topic, but I've got a big gripe with Samsung's partnership with Google, especially considering the launch of Quantum Computed AI that is still being fine-tuned with company-approved censorships.