linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

Technology

204 Beiträge 136 Kommentatoren 5.9k Aufrufe

I isaamoonkhgdt_6143@lemmy.zip

They used ChatGPT 4o, instead of using o1 or o3.

Obviously it was going to fail.
W This user is from outside of this forum
W This user is from outside of this forum
wizardbeard@lemmy.dbzer0.com

schrieb am zuletzt editiert von wizardbeard@lemmy.dbzer0.com

#123

Other studies (not all chess based or against this old chess AI) show similar lackluster results when using reasoning models.

Edit: When comparing reasoning models to existing algorithmic solutions.
1 Antwort Letzte Antwort

0
H halosheep@lemm.ee

I swear every single article critical of current LLMs is like, "The square got BLASTED by the triangle shape when it completely FAILED to go through the triangle shaped hole."
D This user is from outside of this forum
D This user is from outside of this forum
drspod@lemmy.ml

schrieb am zuletzt editiert von

#124

It's newsworthy when the sellers of squares are saying that nobody will ever need a triangle again, and the shape-sector of the stock market is hysterically pumping money into companies that make or use squares.
P I M 3 Antworten Letzte Antwort

38
B baggachipz@sh.itjust.works

They’ve been feeding the toddler everybody else’s baby food and claiming they have the right to.
X This user is from outside of this forum
X This user is from outside of this forum
xavier666@lemm.ee

schrieb am zuletzt editiert von

#125

"If we have to ask every time before stealing a little baby food, our morbidly obese toddler cannot survive"
1 Antwort Letzte Antwort

3
A alecsadler@sh.itjust.works

ChatGPT has been, hands down, the worst AI coding assistant I've ever used.

It regularly suggests code that doesn't compile or isn't even for the language.

It generally suggests AC of code that is just a copy of the lines I just wrote.

Sometimes it likes to suggest setting the same property like 5 times.

It is absolute garbage and I do not recommend it to anyone.
I This user is from outside of this forum
I This user is from outside of this forum
ilikeboobies@lemmy.ca

schrieb am zuletzt editiert von

#126

I’ve had success with splitting a function into 2 and planning out an overview, though that’s more like talking to myself

I wouldn’t use it to generate stuff though
1 Antwort Letzte Antwort

0
D drspod@lemmy.ml

It's newsworthy when the sellers of squares are saying that nobody will ever need a triangle again, and the shape-sector of the stock market is hysterically pumping money into companies that make or use squares.
P This user is from outside of this forum
P This user is from outside of this forum
pushbutton@lemmy.world

schrieb am zuletzt editiert von

#127

You get 2 triangles in a single square mate...

CHECKMATE!
A 1 Antwort Letzte Antwort

6
M monkdervierte@lemmy.zip

LLM are not built for logic.
P This user is from outside of this forum
P This user is from outside of this forum
pushbutton@lemmy.world

schrieb am zuletzt editiert von

#128

And yet everybody is selling to write code.

The last time I checked, coding was requiring logic.
J S 2 Antworten Letzte Antwort

15
F furbag@lemmy.world

Can ChatGPT actually play chess now? Last I checked, it couldn't remember more than 5 moves of history so it wouldn't be able to see the true board state and would make illegal moves, take it's own pieces, materialize pieces out of thin air, etc.
P This user is from outside of this forum
P This user is from outside of this forum
pamasich@kbin.earth

schrieb am zuletzt editiert von

#129

There are custom GPTs which claim to play at a stockfish level or be literally stockfish under the hood (I assume the former is still the latter just not explicitly). Haven't tested them, but if they work, I'd say yes. An LLM itself will never be able to play chess or do anything similar, unless they outsource that task to another tool that can. And there seem to be GPTs that do exactly that.

As for why we need ChatGPT then when the result comes from Stockfish anyway, it's for the natural language prompts and responses.
N 1 Antwort Letzte Antwort

0
D drspod@lemmy.ml

It's newsworthy when the sellers of squares are saying that nobody will ever need a triangle again, and the shape-sector of the stock market is hysterically pumping money into companies that make or use squares.
I This user is from outside of this forum
I This user is from outside of this forum
inconel@lemmy.ca

schrieb am zuletzt editiert von

#130

It's also from a company claiming they're getting closer to create morphing shape that can match any hole.
D 1 Antwort Letzte Antwort

17
D drspod@lemmy.ml

It's newsworthy when the sellers of squares are saying that nobody will ever need a triangle again, and the shape-sector of the stock market is hysterically pumping money into companies that make or use squares.
M This user is from outside of this forum
M This user is from outside of this forum
mrsqueezles@lemmy.world

schrieb am zuletzt editiert von

#131

The press release where OpenAI said we'd never need chess players again
1 Antwort Letzte Antwort

5
L lifecoach5000@lemmy.world

This post did not contain any content.
P This user is from outside of this forum
P This user is from outside of this forum
pamasich@kbin.earth

schrieb am zuletzt editiert von

#132

Isn't the Atari just a game console, not a chess engine?

Like, Wikipedia doesn't mention anything about the Atari 2600 having a built-in chess engine.

If they were willing to run a chess game on the Atari 2600, why did they not apply the same to ChatGPT? There are custom GPTs which claim to use a stockfish API or play at a similar level.

Like this, it's just unfair. Both platforms are not designed to deal with the task by themselves, but one of them is given the necessary tooling, the other one isn't. No matter what you think of ChatGPT, that's not a fair comparison.
J 1 Antwort Letzte Antwort

0
L lifecoach5000@lemmy.world

This post did not contain any content.
H This user is from outside of this forum
H This user is from outside of this forum
harbinger01173430@lemmy.world

schrieb am zuletzt editiert von

#133

Llms useless confirmed once again
1 Antwort Letzte Antwort

2
X x00z@lemmy.world

In all fairness. Machine learning in chess engines is actually pretty strong.

AlphaZero was developed by the artificial intelligence and research company DeepMind, which was acquired by Google. It is a computer program that reached a virtually unthinkable level of play using only reinforcement learning and self-play in order to train its neural networks. In other words, it was only given the rules of the game and then played against itself many millions of times (44 million games in the first nine hours, according to DeepMind).

AlphaZero - Chess Engines

Learn all about the AlphaZero chess program. Everything you need to know about AlphaZero, including what it is, why it is important, and more!

Chess.com (www.chess.com)
J This user is from outside of this forum
J This user is from outside of this forum
jeeva@lemmy.world

schrieb am zuletzt editiert von

#134

Sure, but machine learning like that is very different to how LLMs are trained and their output.
1 Antwort Letzte Antwort

1
O objection@lemmy.ml

Tbf, the article should probably mention the fact that machine learning programs designed to play chess blow everything else out of the water.
A This user is from outside of this forum
A This user is from outside of this forum
andallthat@lemmy.world

schrieb am zuletzt editiert von andallthat@lemmy.world

#135

Machine learning has existed for many years, now. The issue is with these funding-hungry new companies taking their LLMs, repackaging them as "AI" and attributing every ML win ever to "AI".

ML programs designed and trained specifically to identify tumors in medical imaging have become good diagnostic tools. But if you read in news that "AI helps cure cancer", it makes it sound like it was a lone researcher who spent a few minutes engineering the right prompt for Copilot.

Yes a specifically-designed and finely tuned ML program can now beat the best human chess player, but calling it "AI" and bundling it together with the latest Gemini or Claude iteration's "reasoning capabilities" is intentionally misleading. That's why articles like this one are needed. ML is a useful tool but far from the "super-human general intelligence" that is meant to replace half of human workers by the power of wishful prompting
1 Antwort Letzte Antwort

11
L lifecoach5000@lemmy.world

This post did not contain any content.
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemm.ee

schrieb am zuletzt editiert von

#136

Sometimes it seems like most of these AI articles are written by AIs with bad prompts.

Human journalists would hopefully do a little research. A quick search would reveal that researches have been publishing about this for over a year so there's no need to sensationalize it. Perhaps the human journalist could have spent a little time talking about why LLMs are bad at chess and how researchers are approaching the problem.

LLMs on the other hand, are very good at producing clickbait articles with low information content.
N L 2 Antworten Letzte Antwort

48
L lifecoach5000@lemmy.world

This post did not contain any content.
I This user is from outside of this forum
I This user is from outside of this forum
icastfist@programming.dev

schrieb am zuletzt editiert von

#137

So, it fares as well as the average schmuck, proving it is human

/s
1 Antwort Letzte Antwort

2
I inconel@lemmy.ca

It's also from a company claiming they're getting closer to create morphing shape that can match any hole.
D This user is from outside of this forum
D This user is from outside of this forum
dragontypewyvern@midwest.social

schrieb am zuletzt editiert von

#138

And yet the company offers no explanation for how, exactly, they're going to get wood to do that.
1 Antwort Letzte Antwort

5
A alecsadler@sh.itjust.works

I've found Claude 3.7 and 4.0 and sometimes Gemini variants still leagues better than ChatGPT/Copilot.

Still not perfect, but night and day difference.

I feel like ChatGPT didn't focus on coding and instead focused on mainstream, but I am not an expert.
D This user is from outside of this forum
D This user is from outside of this forum
dragontypewyvern@midwest.social

schrieb am zuletzt editiert von dragontypewyvern@midwest.social

#139

Gemini will get basic C++, probably the best documented language for beginners out there, right about half of the time.

I think that might even be the problem, honestly, a bunch of new coders post bad code and it's fixed in comments but the LLM CAN'T realize that.
1 Antwort Letzte Antwort

0
P pixelatedsaturn@lemmy.world

Do they though? No one I talked to, not my coworkers that use it for work, not my friends, not my 72 year old mother think they are sentient.
T This user is from outside of this forum
T This user is from outside of this forum
towardsthefuture@lemmy.zip

schrieb am zuletzt editiert von

#140

Okay I maybe exaggerated a bit, but a lot of people think it actually knows things, or is actually smart. Which… it’s not… at all. It’s just pattern recognition. Which was I assume the point of showing it can’t even beat the goddamn Atari because it cannot think or reason, it’s all just copy pasta and pattern recognition.
1 Antwort Letzte Antwort

0
P pamasich@kbin.earth

Isn't the Atari just a game console, not a chess engine?

Like, Wikipedia doesn't mention anything about the Atari 2600 having a built-in chess engine.

If they were willing to run a chess game on the Atari 2600, why did they not apply the same to ChatGPT? There are custom GPTs which claim to use a stockfish API or play at a similar level.

Like this, it's just unfair. Both platforms are not designed to deal with the task by themselves, but one of them is given the necessary tooling, the other one isn't. No matter what you think of ChatGPT, that's not a fair comparison.
J This user is from outside of this forum
J This user is from outside of this forum
jj4211@lemmy.world

schrieb am zuletzt editiert von

#141

GPTs which claim to use a stockfish API

Then the actual chess isn't LLM. If you are going stockfish, then the LLM doesn't add anything, stockfish is doing everything.

The whole point is the marketing rage is that LLMs can do all kinds of stuff, doubling down on this with the branding of some approaches as "reasoning" models, which are roughly "similar to 'pre-reasoning', but forcing use of more tokens on disposable intermediate generation steps". With this facet of LLM marketing, the promise would be that the LLM can "reason" itself through a chess game without particular enablement. In practice, people trying to feed in gobs of chess data to an LLM end up with an LLM that doesn't even comply to the rules of the game, let alone provide reasonable competitive responses to an oppone.
P 1 Antwort Letzte Antwort

6
N nednobbins@lemm.ee

Sometimes it seems like most of these AI articles are written by AIs with bad prompts.

Human journalists would hopefully do a little research. A quick search would reveal that researches have been publishing about this for over a year so there's no need to sensationalize it. Perhaps the human journalist could have spent a little time talking about why LLMs are bad at chess and how researchers are approaching the problem.

LLMs on the other hand, are very good at producing clickbait articles with low information content.
N This user is from outside of this forum
N This user is from outside of this forum
nova_ad_vitum@lemmy.ca

schrieb am zuletzt editiert von

#142

Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn't help.

This sort of gets to the heart of LLM-based "AI". That one example to me really shows that there's no actual reasoning happening inside. It's producing answers that statistically look like answers that might be given based on that input.

For some things it even works. But calling this intelligence is dubious at best.
N U I J P 5 Antworten Letzte Antwort

23

Anmelden zum Antworten

F

Intel CPU Temperature Monitoring Driver For Linux Now Unmaintained After Layoffs
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
95

1

441 Stimmen

95 Beiträge

0 Aufrufe

N

It's time for Voodoo to make a triumphant return!
I

Hughes.net?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
26

1

6 Stimmen

26 Beiträge

231 Aufrufe

B

If you are within visual sight of the mainland, you can use a pair of point-to-point communication dishes to get internet from the mainland and beam it to yourself. These dishes, only having to communicate over a few miles and with direct line-of-sight, are pretty reliable and not terribly expensive.
D

100R — working offgrid efficiently
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
4

30 Stimmen

4 Beiträge

46 Aufrufe

M

Love 100 rabbits! They have some very interesting projects and articles.
G

AMD warns of new Meltdown, Spectre-like bugs affecting CPUs
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
9

1

198 Stimmen

9 Beiträge

116 Aufrufe

A

This isn't really the same kind of bug. Those bugs made instructions emit the wrong answer, which is obviously really bad, and they're really rare. The bugs in the article make instructions take different amounts of time depending on what else the CPU has done recently, which isn't something anyone would notice except that by asking the kernel to do something and measuring the time to execute affected instructions, an attacker that only had usermode access could learn secrets that should only be available to the kernel.
R

Student visa applicants will now be forced to make their social media accounts public
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
97

1

517 Stimmen

97 Beiträge

1k Aufrufe

I

Fine, here is my pornhub account smh.
A

Hastags killed
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
6

1

16 Stimmen

6 Beiträge

65 Aufrufe

K

£ says: "The fuck they are, mate!"
A

You Don't Need a Big Budget for Big Security: Secure Your App with a Free, Powerful WAF
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

2

1 Stimmen

2 Beiträge

28 Aufrufe

A

If you're a developer, a startup founder, or part of a small team, you've poured countless hours into building your web application. You've perfected the UI, optimized the database, and shipped features your users love. But in the rush to build and deploy, a critical question often gets deferred: is your application secure? For many, the answer is a nervous "I hope so." The reality is that without a proper defense, your application is exposed to a barrage of automated attacks hitting the web every second. Threats like SQL Injection, Cross-Site Scripting (XSS), and Remote Code Execution are not just reserved for large enterprises; they are constant dangers for any application with a public IP address. The Security Barrier: When Cost and Complexity Get in the Way The standard recommendation is to place a Web Application Firewall (WAF) in front of your application. A WAF acts as a protective shield, inspecting incoming traffic and filtering out malicious requests before they can do any damage. It’s a foundational piece of modern web security. So, why doesn't everyone have one? Historically, robust WAFs have been complex and expensive. They required significant budgets, specialized knowledge to configure, and ongoing maintenance, putting them out of reach for students, solo developers, non-profits, and early-stage startups. This has created a dangerous security divide, leaving the most innovative and resource-constrained projects the most vulnerable. But that is changing. Democratizing Security: The Power of a Community WAF Security should be a right, not a privilege. Recognizing this, the landscape is shifting towards more accessible, community-driven tools. The goal is to provide powerful, enterprise-grade protection to everyone, for free. This is the principle behind the HaltDos Community WAF. It's a no-cost, perpetually free Web Application Firewall designed specifically for the community that has been underserved for too long. It’s not a stripped-down trial version; it’s a powerful security tool designed to give you immediate and effective protection against the OWASP Top 10 and other critical web threats. What Can You Actually Do with It? With a community WAF, you can deploy a security layer in minutes that: Blocks Malicious Payloads: Get instant, out-of-the-box protection against common attack patterns like SQLi, XSS, RCE, and more. Stops Bad Bots: Prevent malicious bots from scraping your content, attempting credential stuffing, or spamming your forms. Gives You Visibility: A real-time dashboard shows you exactly who is trying to attack your application and what methods they are using, providing invaluable security intelligence. Allows Customization: You can add your own custom security rules to tailor the protection specifically to your application's logic and technology stack. The best part? It can be deployed virtually anywhere—on-premises, in a private cloud, or with any major cloud provider like AWS, Azure, or Google Cloud. Get Started in Minutes You don't need to be a security guru to use it. The setup is straightforward, and the value is immediate. Protecting the project, you've worked so hard on is no longer a question of budget. Download: Get the free Community WAF from the HaltDos site. Deploy: Follow the simple instructions to set it up with your web server (it’s compatible with Nginx, Apache, and others). Secure: Watch the dashboard as it begins to inspect your traffic and block threats in real-time. Security is a journey, but it must start somewhere. For developers, startups, and anyone running a web application on a tight budget, a community WAF is the perfect first step. It's powerful, it's easy, and it's completely free.
S

Snapchat Reserves the Right to Use AI-Generated Images of Your Face in Ads
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

1

0 Stimmen

3 Beiträge

37 Aufrufe

J

I deleted the snapchat now.