linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

Technology

203 Beiträge 136 Kommentatoren 5 Aufrufe

J jj4211@lemmy.world

To be fair, a decent chunk of coding is stupid boilerplate/minutia that varies environment to environment, language to language, library to library.

So LLM can do some code completion, filling out a bunch of boilerplate that is blatantly obvious, generating the redundant text mandated by certain patterns, and keeping straight details between languages like "does this language want join as a method on a list with a string argument, or vice versa?"

Problem is this can be sometimes more annoying than it's worth, as miscompletions are annoying.
L This user is from outside of this forum
L This user is from outside of this forum
lambalicious@lemmy.sdf.org

schrieb zuletzt editiert von

#171

a decent chunk of coding is stupid boilerplate/minutia that varies

...according to a logic, which means LLMs are bad at it.
J 1 Antwort Letzte Antwort

0
L lifecoach5000@lemmy.world

This post did not contain any content.
C This user is from outside of this forum
C This user is from outside of this forum
cley_faye@lemmy.world

schrieb zuletzt editiert von

#172

Ah, you used logic. That's the issue. They don't do that.
1 Antwort Letzte Antwort

23
L lambalicious@lemmy.sdf.org

a decent chunk of coding is stupid boilerplate/minutia that varies

...according to a logic, which means LLMs are bad at it.
J This user is from outside of this forum
J This user is from outside of this forum
jj4211@lemmy.world

schrieb zuletzt editiert von

#173

I'd say that those details that vary tend not to vary within a language and ecosystem, so a fairly dumb correlative relationship is enough to generally be fine. There's no way to use logic to infer that it's obvious that in language X you need to do mylist.join(string) but in language Y you need to do string.join(mylist), but it's super easy to recognize tokens that suggest those things and a correlation to the vocabulary that matches the context.

Rinse and repeat for things like do I need to specify type and what is the vocabulary for the best type for a numeric value, This variable that makes sense is missing a declaration, does this look to actually be a new distinct variable or just a typo of one that was declared.

But again, I'm thinking mostly in what kind of sort of can work, my experience personally is that it's wrong so often as to be annoying and get in the way of more traditional completion behaviors that play it safe, though with less help particularly for languages like python or javascript.
1 Antwort Letzte Antwort

2
A anunusualrelic@lemmy.world

Parrots are actually intelligent though.
E This user is from outside of this forum
E This user is from outside of this forum
etterra@discuss.online

schrieb zuletzt editiert von

#174

Yeah, but not when it comes to understanding human speech. There's a reason that repeating words without really understanding them is called parroting. Gray parrots are the smartest and some can actually understand language a little bit, making them smarter than chat, which is just high tech guessing without comprehension
1 Antwort Letzte Antwort

0
N neilbru@lemmy.world

An LLM is a poor computational/predictive paradigm for playing chess.
S This user is from outside of this forum
S This user is from outside of this forum
surph_ninja@lemmy.world

schrieb zuletzt editiert von

#175

This just in: a hammer makes a poor screwdriver.
W 1 Antwort Letzte Antwort

28
S surph_ninja@lemmy.world

This just in: a hammer makes a poor screwdriver.
W This user is from outside of this forum
W This user is from outside of this forum
whyjiffie@sh.itjust.works

schrieb zuletzt editiert von

#176

LLMs are more like a leaf blower though
1 Antwort Letzte Antwort

9
T takapapatapaka@lemmy.world

Actually, a very specific model (chatgpt3.5-turbo-instruct) was pretty good at chess (around 1700 elo if i remember correctly).
N This user is from outside of this forum
N This user is from outside of this forum
neilbru@lemmy.world

schrieb zuletzt editiert von

#177

I'm impressed, if that's true! In general, an LLM's training cost vs. an LSTM, RNN, or some other more appropriate DNN algorithm suitable for the ruleset is laughably high.
T 1 Antwort Letzte Antwort

3
L lifecoach5000@lemmy.world

This post did not contain any content.
S This user is from outside of this forum
S This user is from outside of this forum
sidhean@lemmy.blahaj.zone

schrieb zuletzt editiert von

#178

Can i fistfight ChatGPT next? I bet I could kick its ass, too
1 Antwort Letzte Antwort

9
I interdimensionalmeme@lemmy.ml

I think the biggest problem is it's very low ability to "test time adaptability". Even when combined with a reasonning model outputting into its context, the weights do not learn out of the immediate context.

I think the solution might be to train a LoRa overlay on the fly against the weights and run inference with that AND the unmodified weights and then have an overseer model self evaluate and recompose the raw outputs.

Like humans are way better at answering stuff when it's a collaboration of more than one person. I suspect the same is true of LLMs.
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemm.ee

schrieb zuletzt editiert von

#179

Like humans are way better at answering stuff when it’s a collaboration of more than one person. I suspect the same is true of LLMs.

It is.

It's really common for non-language implementations of neural networks. If you have an NN that's right some percentage of the time, you can often run it through a bunch of copies of the NNs and take the average and that average is correct a higher percentage of the time.

Aider is an open source AI coding assistant that lets you use one model to plan the coding and a second one to do the actual coding. It works better than doing it in a single pass, even if you assign the the same model to planing and coding.
1 Antwort Letzte Antwort

0
N nova_ad_vitum@lemmy.ca

Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn't help.

This sort of gets to the heart of LLM-based "AI". That one example to me really shows that there's no actual reasoning happening inside. It's producing answers that statistically look like answers that might be given based on that input.

For some things it even works. But calling this intelligence is dubious at best.
J This user is from outside of this forum
J This user is from outside of this forum
jacksonlamb@lemmy.world

schrieb zuletzt editiert von

#180

ChatGPT versus Deepseek is hilarious. They both cheat like crazy and then one side jedi mind tricks the winner into losing.
S 1 Antwort Letzte Antwort

2
L lovablesidekick@lemmy.world

In this case it's not even bad prompts, it's a problem domain ChatGPT wasn't designed to be good at. It's like saying modern medicine is clearly bullshit because a doctor loses a basketball game.
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemm.ee

schrieb zuletzt editiert von

#181

I imagine the "author" did something like, "Search http://google.scholar.com/ find a publication where AI failed at something and write a paragraph about it."

It's not even as bad as the article claims.

Atari isn't great at chess. https://chess.stackexchange.com/questions/24952/how-strong-is-each-level-of-atari-2600s-video-chess
Random LLMs were nearly as good 2 years ago. https://lmsys.org/blog/2023-05-03-arena/
LLMs that are actually trained for chess have done much better. https://arxiv.org/abs/2501.17186
L 1 Antwort Letzte Antwort

1
J jacksonlamb@lemmy.world

ChatGPT versus Deepseek is hilarious. They both cheat like crazy and then one side jedi mind tricks the winner into losing.
S This user is from outside of this forum
S This user is from outside of this forum
schadrach@lemmy.sdf.org

schrieb zuletzt editiert von

#182

So they are both masters of troll chess then?

See: King of the Bridge
1 Antwort Letzte Antwort

1
P pushbutton@lemmy.world

And yet everybody is selling to write code.

The last time I checked, coding was requiring logic.
S This user is from outside of this forum
S This user is from outside of this forum
schadrach@lemmy.sdf.org

schrieb zuletzt editiert von

#183

A lot of writing code is relatively standard patterns and variations on them. For most but the really interesting parts, you could probably write a sufficiently detailed description and get an LLM to produce functional code that does the thing.

Basically for a bunch of common structures and use cases, the logic already exists and is well known and replicated by enough people in enough places in enough languages that an LLM can replicate it well enough, like literally anyone else who has ever written anything in that language.
1 Antwort Letzte Antwort

0
N neilbru@lemmy.world

I'm impressed, if that's true! In general, an LLM's training cost vs. an LSTM, RNN, or some other more appropriate DNN algorithm suitable for the ruleset is laughably high.
T This user is from outside of this forum
T This user is from outside of this forum
takapapatapaka@lemmy.world

schrieb zuletzt editiert von

#184

Oh yes, cost of training are ofc a great loss here, it's not optimized at all, and it's stuck at an average level.

Interestingly, i believe some people did research on it and found some parameters in the model that seemed to represent the state of the chess board (as in, they seem to reflect the current state of the board, and when artificially modified, the model takes modification into account in its playing). It was used by a french youtuber to show how LLMs can somehow have a kinda representation of the world. I can try to get the sources back if you're interested.
N 1 Antwort Letzte Antwort

1
T takapapatapaka@lemmy.world

Oh yes, cost of training are ofc a great loss here, it's not optimized at all, and it's stuck at an average level.

Interestingly, i believe some people did research on it and found some parameters in the model that seemed to represent the state of the chess board (as in, they seem to reflect the current state of the board, and when artificially modified, the model takes modification into account in its playing). It was used by a french youtuber to show how LLMs can somehow have a kinda representation of the world. I can try to get the sources back if you're interested.
N This user is from outside of this forum
N This user is from outside of this forum
neilbru@lemmy.world

schrieb zuletzt editiert von neilbru@lemmy.world

#185

Absolutely interested. Thank you for your time to share that.

My career path in neural networks began as a researcher for cancerous tissue object detection in medical diagnostic imaging. Now it is switched to generative models for CAD (architecture, product design, game assets, etc.). I don't really mess about with fine-tuning LLMs.

However, I do self-host my own LLMs as code assistants. Thus, I'm only tangentially involved with the current LLM craze.

But it does interest me, nonetheless!
T 1 Antwort Letzte Antwort

0
1 13igtyme@lemmy.world

Don't call my fish stupid.
V This user is from outside of this forum
V This user is from outside of this forum
venator@lemmy.nz

schrieb zuletzt editiert von

#186

Well, can it climb trees?
1 Antwort Letzte Antwort

0
N neilbru@lemmy.world

An LLM is a poor computational/predictive paradigm for playing chess.
B This user is from outside of this forum
B This user is from outside of this forum
bleys@lemmy.world

schrieb zuletzt editiert von

#187

The underlying neural network tech is the same as what the best chess AIs (AlphaZero, Leela) use. The problem is, as you said, that ChatGPT is designed specifically as an LLM so it’s been optimized strictly to write semi-coherent text first, and then any problem solving beyond that is ancillary. Which should say a lot about how inconsistent ChatGPT is at solving problems, given that it’s not actually optimized for any specific use cases.
N 1 Antwort Letzte Antwort

5
N nednobbins@lemm.ee

I imagine the "author" did something like, "Search http://google.scholar.com/ find a publication where AI failed at something and write a paragraph about it."

It's not even as bad as the article claims.

Atari isn't great at chess. https://chess.stackexchange.com/questions/24952/how-strong-is-each-level-of-atari-2600s-video-chess
Random LLMs were nearly as good 2 years ago. https://lmsys.org/blog/2023-05-03-arena/
LLMs that are actually trained for chess have done much better. https://arxiv.org/abs/2501.17186
L This user is from outside of this forum
L This user is from outside of this forum
lovablesidekick@lemmy.world

schrieb zuletzt editiert von

#188

Wouldn't surprise me if an LLM trained on records of chess moves made good chess moves. I just wouldn't expect the deployed version of ChatGPT to generate coherent chess moves based on the general text it's been trained on.
N 1 Antwort Letzte Antwort

0
B bleys@lemmy.world

The underlying neural network tech is the same as what the best chess AIs (AlphaZero, Leela) use. The problem is, as you said, that ChatGPT is designed specifically as an LLM so it’s been optimized strictly to write semi-coherent text first, and then any problem solving beyond that is ancillary. Which should say a lot about how inconsistent ChatGPT is at solving problems, given that it’s not actually optimized for any specific use cases.
N This user is from outside of this forum
N This user is from outside of this forum
neilbru@lemmy.world

schrieb zuletzt editiert von neilbru@lemmy.world

#189

Yes, I agree wholeheartedly with your clarification.

My career path, as I stated in a different comment in regards to neural networks, is focused on generative DNNs for CAD applications and parametric 3D modeling. Before that, I began as a researcher in cancerous tissue classification and object detection in medical diagnostic imaging.

Thus, large language models are well out of my area of expertise in terms of the architecture of their models.

However, fundamentally it boils down to the fact that the specific large language model used was designed to predict text and not necessarily solve problems/play games to "win"/"survive".

(I admit that I'm just parroting what you stated and maybe rehashing what I stated even before that, but I like repeating and refining in simple terms to practice explaining to laymen and, dare I say, clients. It helps me feel as if I don't come off too pompously when talking about this subject to others; forgive my tedium.)
1 Antwort Letzte Antwort

2
L lifecoach5000@lemmy.world

This post did not contain any content.
K This user is from outside of this forum
K This user is from outside of this forum
korhaka@sopuli.xyz

schrieb zuletzt editiert von

#190

Is anyone actually surprised at that?
1 Antwort Letzte Antwort

1

Anmelden zum Antworten

P

Meta Filed a Lawsuit Against The Entity Behind CrushAI Nudify App.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
20

1

91 Stimmen

20 Beiträge

0 Aufrufe

I

Can I sue meta for non consensual tracking and making it so I can't even look at a picture 90% of the fucking time because it's account gated to Instagram?
P

Computer says no: Impact of automated decision-making on human life; Algorithms are deciding whether a patient receives an organ transplant or not; Algorithms use in Welfare, Penalise the poor.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
13

1

180 Stimmen

13 Beiträge

0 Aufrufe

D

There is a huge difference between an algorithm using real world data to produce a score a panel of experts use to make a determination and using a LLM to screen candidates. One has verifiable reproducible results that can be checked and debated the other does not. The final call does not matter if a computer program using an unknown and unreproducible algorithm screens you out before this. This is what we are facing. Pre-determined decisions that human beings are not being held accountable to. Is this happening right now? Yes it is, without a doubt. People are no longer making a lot of healthcare decisions determining insurance coverage. Computers that are not accountable are. You may have some ability to disagree but for how long? Soon there will be no way to reach a human about an insurance decision. This is already happening. People should be very anxious. Hearing United Healthcare has been forging DNRs and has been denying things like treatment for stroke for elders is disgusting. We have major issues that are not going away and we are blatantly ignoring them.
P

Welcome to the web we lost
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
22

1

181 Stimmen

22 Beiträge

4 Aufrufe

C

Is it though? Its always far easier to be loud and obnoxious than do something constructive, even with the internet and LLMs, in fact those things are amplifiers which if anything make the attention imbalance even more drastic and unrepresentative of actual human behaviour. In the time it takes me to write this comment some troll can write a dozen hateful ones, or a bot can write a thousand. Doesn't mean humans are shitty in a 1000/1 ratio, just means shitty people can now be a thousand times louder.
B

I'm making a guide to Pocket alternatives: getoffpocket.com
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
29

158 Stimmen

29 Beiträge

6 Aufrufe

F

Karakeep seems more active than any of these, with a larger feature set potentially: https://github.com/karakeep-app/karakeep
L

Covert Web-to-App Tracking via Localhost on Android
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

28 Stimmen

3 Beiträge

3 Aufrufe

P

That update though: "... completely removed..." I assume this is because someone at Meta realized this was a huge breach of trust, and likely quite illegal. Edit: I read somewhere that they're just being cautious about Google Play terms of service. That feels worse.
P

Google is going ‘all in’ on AI. It’s part of a troubling trend in big tech
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
119

1

219 Stimmen

119 Beiträge

5 Aufrufe

L

Okay, I'd be interested to hear what you think is wrong with this, because I'm pretty sure it's more or less correct. Some sources for you to help you understand these concepts a bit better: What DLSS is and how it works as a starter: https://en.wikipedia.org/wiki/Deep_Learning_Super_Sampling Issues with modern "optimization", including DLSS: https://www.youtube.com/watch?v=lJu_DgCHfx4 TAA comparisons (yes, biased, but accurate): https://old.reddit.com/r/FuckTAA/comments/1e7ozv0/rfucktaa_resource/
P

Building a personal archive of the web, the slow way
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1

24 Stimmen

2 Beiträge

2 Aufrufe

K

Or just use Linkwarden or Karakeep (previously Hoarder)
D

Disney to implement measures limiting password sharing on streaming services from June
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

0 Stimmen

2 Beiträge

2 Aufrufe

A

The enshittification continues, but it doesn't affect me at all. Piracy is the way to go nowadays that all streaming services suck. !piracy@lemmy.dbzer0.com