ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic
-
Absolutely interested. Thank you for your time to share that.
My career path in neural networks began as a researcher for cancerous tissue object detection in medical diagnostic imaging. Now it is switched to generative models for CAD (architecture, product design, game assets, etc.). I don't really mess about with fine-tuning LLMs.
However, I do self-host my own LLMs as code assistants. Thus, I'm only tangentially involved with the current LLM craze.
But it does interest me, nonetheless!
schrieb am 11. Juni 2025, 14:38 zuletzt editiert vonHere is the main blog post that i remembered : it has a follow up, a more scientific version, and uses two other articles as a basis, so you might want to dig around what they mention in the introduction.
It is indeed a quite technical discovery, and it still lacks complete and wider analysis, but it is very interesting for the fact that it kinda invalidates the common gut feeling that llms are pure lucky random.
-
Using an LLM as a chess engine is like using a power tool as a table leg. Pretty funny honestly, but it's obviously not going to be good at it, at least not without scaffolding.
schrieb am 11. Juni 2025, 18:00 zuletzt editiert vonis like using a power tool as a table leg.
Then again, our corporate lords and masters are trying to replace all manner of skilled workers with those same LLM "AI" tools.
And clearly that will backfire on them and they'll eventually scramble to find people with the needed skills, but in the meantime tons of people will have lost their source of income.
-
This post did not contain any content.schrieb am 11. Juni 2025, 19:44 zuletzt editiert von
If you don't play chess, the Atari is probably going to beat you as well.
LLMs are only good at things to the extent that they have been well-trained in the relevant areas. Not just learning to predict text string sequences, but reinforcement learning after that, where a human or some other agent says "this answer is better than that one" enough times in enough of the right contexts. It mimics the way humans learn, which is through repeated and diverse exposure.
If they set up a system to train it against some chess program, or (much simpler) simply gave it a tool call, it would do much better. Tool calling already exists and would be by far the easiest way.
It could also be instructed to write a chess solver program and then run it, at which point it would be on par with the Atari, but it wouldn't compete well with a serious chess solver.
-
is like using a power tool as a table leg.
Then again, our corporate lords and masters are trying to replace all manner of skilled workers with those same LLM "AI" tools.
And clearly that will backfire on them and they'll eventually scramble to find people with the needed skills, but in the meantime tons of people will have lost their source of income.
schrieb am 11. Juni 2025, 21:11 zuletzt editiert von jsomae@lemmy.ml 6. Nov. 2025, 23:12If you believe LLMs are not good at anything then there should be relatively little to worry about in the long-term, but I am more concerned.
It's not obvious to me that it will backfire for them, because I believe LLMs are good at some things (that is, when they are used correctly, for the correct tasks). Currently they're being applied to far more use cases than they are likely to be good at -- either because they're overhyped or our corporate lords and masters are just experimenting to find out what they're good at and what not. Some of these cases will be like chess, but others will be like code*.
(* not saying LLMs are good at code in general, but for some coding applications I believe they are vastly more efficient than humans, even if a human expert can currently write higher-quality less-buggy code.)
-
Hardly surprising. Llms aren't -thinking- they're just shitting out the next token for any given input of tokens.
schrieb am 11. Juni 2025, 21:43 zuletzt editiert vonThat's exactly what thinking is, though.
-
This post did not contain any content.schrieb am 11. Juni 2025, 21:52 zuletzt editiert von
2025 Mazda MX-5 Miata 'got absolutely wrecked' by Inflatable Boat in beginner's boat racing match — Mazda's newest model bamboozled by 1930s technology.
-
This post did not contain any content.schrieb am 12. Juni 2025, 03:46 zuletzt editiert von
this is because an LLM is not made for playing chess
-
If you believe LLMs are not good at anything then there should be relatively little to worry about in the long-term, but I am more concerned.
It's not obvious to me that it will backfire for them, because I believe LLMs are good at some things (that is, when they are used correctly, for the correct tasks). Currently they're being applied to far more use cases than they are likely to be good at -- either because they're overhyped or our corporate lords and masters are just experimenting to find out what they're good at and what not. Some of these cases will be like chess, but others will be like code*.
(* not saying LLMs are good at code in general, but for some coding applications I believe they are vastly more efficient than humans, even if a human expert can currently write higher-quality less-buggy code.)
schrieb am 12. Juni 2025, 15:34 zuletzt editiert vonI believe LLMs are good at some things
The problem is that they're being used for all the things, including a large number of tasks that thwy are not well suited to.
-
I believe LLMs are good at some things
The problem is that they're being used for all the things, including a large number of tasks that thwy are not well suited to.
schrieb am 12. Juni 2025, 15:49 zuletzt editiert vonyeah, we agree on this point. In the short term it's a disaster. In the long-term, assuming AI's capabilities don't continue to improve at the rate they have been, our corporate overlords will only replace people for whom it's actually worth it to them to replace with AI.
-
That's exactly what thinking is, though.
schrieb am 13. Juni 2025, 04:22 zuletzt editiert von arc99@lemmy.worldAn LLM is an ordered series of parameterized / weighted nodes which are fed a bunch of tokens, and millions of calculations later result generates the next token to append and repeat the process. It's like turning a handle on some complex Babbage-esque machine. LLMs use a tiny bit of randomness ("temperature") when choosing the next token so the responses are not identical each time.
But it is not thinking. Not even remotely so. It's a simulacrum. If you want to see this, run ollama with the temperature set to 0 e.g.
ollama run gemma3:4b >>> /set parameter temperature 0 >>> what is a leaf
You will get the same answer every single time.
-
An LLM is an ordered series of parameterized / weighted nodes which are fed a bunch of tokens, and millions of calculations later result generates the next token to append and repeat the process. It's like turning a handle on some complex Babbage-esque machine. LLMs use a tiny bit of randomness ("temperature") when choosing the next token so the responses are not identical each time.
But it is not thinking. Not even remotely so. It's a simulacrum. If you want to see this, run ollama with the temperature set to 0 e.g.
ollama run gemma3:4b >>> /set parameter temperature 0 >>> what is a leaf
You will get the same answer every single time.
schrieb am 17. Juni 2025, 02:31 zuletzt editiert von stevedice@sh.itjust.worksI know what an LLM is doing. You don't know what your brain is doing.
-
North Korea sent me abroad to be a secret IT worker. My wages funded the regime
Technology204 vor 8 Tagenvor 9 Tagen1
-
Eyes in the Sky: A Comprehensive Survey of Ukrainian Unmanned Aerial Vehicles (UAVs)
Technology204 vor 25 Tagenvor 28 Tagen1
-
Threads is nearing X's daily app users, new data shows
Technology204 vor 21 Tagen8. Juli 2025, 13:271
-
-
-
-
-