ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic
-
They used ChatGPT 4o, instead of using o1 or o3.
Obviously it was going to fail.
Other studies (not all chess based or against this old chess AI) show similar lackluster results when using reasoning models.
Edit: When comparing reasoning models to existing algorithmic solutions.
-
I swear every single article critical of current LLMs is like, "The square got BLASTED by the triangle shape when it completely FAILED to go through the triangle shaped hole."
It's newsworthy when the sellers of squares are saying that nobody will ever need a triangle again, and the shape-sector of the stock market is hysterically pumping money into companies that make or use squares.
-
They’ve been feeding the toddler everybody else’s baby food and claiming they have the right to.
"If we have to ask every time before stealing a little baby food, our morbidly obese toddler cannot survive"
-
ChatGPT has been, hands down, the worst AI coding assistant I've ever used.
It regularly suggests code that doesn't compile or isn't even for the language.
It generally suggests AC of code that is just a copy of the lines I just wrote.
Sometimes it likes to suggest setting the same property like 5 times.
It is absolute garbage and I do not recommend it to anyone.
I’ve had success with splitting a function into 2 and planning out an overview, though that’s more like talking to myself
I wouldn’t use it to generate stuff though
-
It's newsworthy when the sellers of squares are saying that nobody will ever need a triangle again, and the shape-sector of the stock market is hysterically pumping money into companies that make or use squares.
You get 2 triangles in a single square mate...
CHECKMATE!
-
LLM are not built for logic.
And yet everybody is selling to write code.
The last time I checked, coding was requiring logic.
-
Can ChatGPT actually play chess now? Last I checked, it couldn't remember more than 5 moves of history so it wouldn't be able to see the true board state and would make illegal moves, take it's own pieces, materialize pieces out of thin air, etc.
There are custom GPTs which claim to play at a stockfish level or be literally stockfish under the hood (I assume the former is still the latter just not explicitly). Haven't tested them, but if they work, I'd say yes. An LLM itself will never be able to play chess or do anything similar, unless they outsource that task to another tool that can. And there seem to be GPTs that do exactly that.
As for why we need ChatGPT then when the result comes from Stockfish anyway, it's for the natural language prompts and responses.
-
It's newsworthy when the sellers of squares are saying that nobody will ever need a triangle again, and the shape-sector of the stock market is hysterically pumping money into companies that make or use squares.
It's also from a company claiming they're getting closer to create morphing shape that can match any hole.
-
It's newsworthy when the sellers of squares are saying that nobody will ever need a triangle again, and the shape-sector of the stock market is hysterically pumping money into companies that make or use squares.
The press release where OpenAI said we'd never need chess players again
-
This post did not contain any content.
Isn't the Atari just a game console, not a chess engine?
Like, Wikipedia doesn't mention anything about the Atari 2600 having a built-in chess engine.
If they were willing to run a chess game on the Atari 2600, why did they not apply the same to ChatGPT? There are custom GPTs which claim to use a stockfish API or play at a similar level.
Like this, it's just unfair. Both platforms are not designed to deal with the task by themselves, but one of them is given the necessary tooling, the other one isn't. No matter what you think of ChatGPT, that's not a fair comparison.
-
This post did not contain any content.
Llms useless confirmed once again
-
In all fairness. Machine learning in chess engines is actually pretty strong.
AlphaZero was developed by the artificial intelligence and research company DeepMind, which was acquired by Google. It is a computer program that reached a virtually unthinkable level of play using only reinforcement learning and self-play in order to train its neural networks. In other words, it was only given the rules of the game and then played against itself many millions of times (44 million games in the first nine hours, according to DeepMind).
AlphaZero - Chess Engines
Learn all about the AlphaZero chess program. Everything you need to know about AlphaZero, including what it is, why it is important, and more!
Chess.com (www.chess.com)
Sure, but machine learning like that is very different to how LLMs are trained and their output.
-
Tbf, the article should probably mention the fact that machine learning programs designed to play chess blow everything else out of the water.
Machine learning has existed for many years, now. The issue is with these funding-hungry new companies taking their LLMs, repackaging them as "AI" and attributing every ML win ever to "AI".
ML programs designed and trained specifically to identify tumors in medical imaging have become good diagnostic tools. But if you read in news that "AI helps cure cancer", it makes it sound like it was a lone researcher who spent a few minutes engineering the right prompt for Copilot.
Yes a specifically-designed and finely tuned ML program can now beat the best human chess player, but calling it "AI" and bundling it together with the latest Gemini or Claude iteration's "reasoning capabilities" is intentionally misleading. That's why articles like this one are needed. ML is a useful tool but far from the "super-human general intelligence" that is meant to replace half of human workers by the power of wishful prompting
-
This post did not contain any content.
Sometimes it seems like most of these AI articles are written by AIs with bad prompts.
Human journalists would hopefully do a little research. A quick search would reveal that researches have been publishing about this for over a year so there's no need to sensationalize it. Perhaps the human journalist could have spent a little time talking about why LLMs are bad at chess and how researchers are approaching the problem.
LLMs on the other hand, are very good at producing clickbait articles with low information content.
-
This post did not contain any content.
So, it fares as well as the average schmuck, proving it is human
/s
-
It's also from a company claiming they're getting closer to create morphing shape that can match any hole.
And yet the company offers no explanation for how, exactly, they're going to get wood to do that.
-
I've found Claude 3.7 and 4.0 and sometimes Gemini variants still leagues better than ChatGPT/Copilot.
Still not perfect, but night and day difference.
I feel like ChatGPT didn't focus on coding and instead focused on mainstream, but I am not an expert.
Gemini will get basic C++, probably the best documented language for beginners out there, right about half of the time.
I think that might even be the problem, honestly, a bunch of new coders post bad code and it's fixed in comments but the LLM CAN'T realize that.
-
Do they though? No one I talked to, not my coworkers that use it for work, not my friends, not my 72 year old mother think they are sentient.
Okay I maybe exaggerated a bit, but a lot of people think it actually knows things, or is actually smart. Which… it’s not… at all. It’s just pattern recognition. Which was I assume the point of showing it can’t even beat the goddamn Atari because it cannot think or reason, it’s all just copy pasta and pattern recognition.
-
Isn't the Atari just a game console, not a chess engine?
Like, Wikipedia doesn't mention anything about the Atari 2600 having a built-in chess engine.
If they were willing to run a chess game on the Atari 2600, why did they not apply the same to ChatGPT? There are custom GPTs which claim to use a stockfish API or play at a similar level.
Like this, it's just unfair. Both platforms are not designed to deal with the task by themselves, but one of them is given the necessary tooling, the other one isn't. No matter what you think of ChatGPT, that's not a fair comparison.
GPTs which claim to use a stockfish API
Then the actual chess isn't LLM. If you are going stockfish, then the LLM doesn't add anything, stockfish is doing everything.
The whole point is the marketing rage is that LLMs can do all kinds of stuff, doubling down on this with the branding of some approaches as "reasoning" models, which are roughly "similar to 'pre-reasoning', but forcing use of more tokens on disposable intermediate generation steps". With this facet of LLM marketing, the promise would be that the LLM can "reason" itself through a chess game without particular enablement. In practice, people trying to feed in gobs of chess data to an LLM end up with an LLM that doesn't even comply to the rules of the game, let alone provide reasonable competitive responses to an oppone.
-
Sometimes it seems like most of these AI articles are written by AIs with bad prompts.
Human journalists would hopefully do a little research. A quick search would reveal that researches have been publishing about this for over a year so there's no need to sensationalize it. Perhaps the human journalist could have spent a little time talking about why LLMs are bad at chess and how researchers are approaching the problem.
LLMs on the other hand, are very good at producing clickbait articles with low information content.
Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn't help.
This sort of gets to the heart of LLM-based "AI". That one example to me really shows that there's no actual reasoning happening inside. It's producing answers that statistically look like answers that might be given based on that input.
For some things it even works. But calling this intelligence is dubious at best.