linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI Chatbots Remain Overconfident — Even When They’re Wrong: Large Language Models appear to be unaware of their own mistakes, prompting concerns about common uses for AI chatbots.

Technology

68 Beiträge 42 Kommentatoren 857 Aufrufe

P pro@programming.dev

This post did not contain any content.
C This user is from outside of this forum
C This user is from outside of this forum
cosmonova@lemmy.world

schrieb zuletzt editiert von

#29

Is that a recycled piece from 2023? Because we already knew that.
1 Antwort Letzte Antwort

6
P pro@programming.dev

This post did not contain any content.
K This user is from outside of this forum
K This user is from outside of this forum
kameecoding@lemmy.world

schrieb zuletzt editiert von

#30

Oh shit, they do behave like humans after all.
1 Antwort Letzte Antwort

0
S shalafi@lemmy.world

Neither are our brains.

“Brains are survival engines, not truth detectors. If self-deception promotes fitness, the brain lies. Stops noticing—irrelevant things. Truth never matters. Only fitness. By now you don’t experience the world as it exists at all. You experience a simulation built from assumptions. Shortcuts. Lies. Whole species is agnosiac by default.”

― Peter Watts, Blindsight (fiction)

Starting to think we're really not much smarter. "But LLMs tell us what we want to hear!" Been on FaceBook lately, or lemmy?

If nothing else, LLMs have woke me to how stupid humans are vs. the machines.
S This user is from outside of this forum
S This user is from outside of this forum
saimen@feddit.org

schrieb zuletzt editiert von

#31

Just a moment...

(scienceandnonduality.com)
1 Antwort Letzte Antwort

2
P pro@programming.dev

This post did not contain any content.
C This user is from outside of this forum
C This user is from outside of this forum
cley_faye@lemmy.world

schrieb zuletzt editiert von

#32

prompting concerns

Oh you.
1 Antwort Letzte Antwort

8
M modern_medicine_isnt@lemmy.world

It's easy, just ask the AI "are you sure"? Until it stops changing it's answer.

But seriously, LLMs are just advanced autocomplete.
C This user is from outside of this forum
C This user is from outside of this forum
cley_faye@lemmy.world

schrieb zuletzt editiert von

#33

Ah, the monte-carlo approach to truth.
1 Antwort Letzte Antwort

5
L lfrith@lemmy.ca

They can even get math wrong. Which surprised me. Had to tell it the answer is wrong for them to recalculate and then get the correct answer. It was simple percentages of a list of numbers I had asked.
S This user is from outside of this forum
S This user is from outside of this forum
saimen@feddit.org

schrieb zuletzt editiert von

#34

I once gave some kind of math problem (how to break down a certain amount of money into bills) and the llm wrote a python script for it, ran it and thus gave me the correct answer. Kind of clever really.
1 Antwort Letzte Antwort

1
T thb@lemmy.world

It's like talking with someone who thinks the Earth is flat. There isn't anything to discuss. They're objectively wrong.

Humans like to anthropomorphize everything. It's why you can see a face on a car's front grille. LLMs are ultra advanced pattern matching algorithms. They do not think or reason or have any kind of opinion or sentience, yet they are being utilized as if they do. Let's see how it works out for the world, I guess.
S This user is from outside of this forum
S This user is from outside of this forum
saimen@feddit.org

schrieb zuletzt editiert von saimen@feddit.org

#35

I think so too, but I am really curious what will happen when we give them "bodies" with sensors so they can explore the world and make individual "experiences". I could imagine they would act much more human after a while and might even develop some kind of sentience.

Of course they would also need some kind of memory and self-actualization processes.
J 1 Antwort Letzte Antwort

0
A aesthelete@lemmy.world

Every thread about LLMs has to have some guy like yourself saying how LLMs are like humans and smarter than humans for some reason.
D This user is from outside of this forum
D This user is from outside of this forum
dontbelievethis@sh.itjust.works

schrieb zuletzt editiert von

#36

Some humans are not as smart as LLMs, I give them that.
A 1 Antwort Letzte Antwort

1
G gissamittjobb@lemmy.ml

Language models are unsuitable for math problems broadly speaking. We already have good technology solutions for that category of problems. Luckily, you can combine the two - prompt the model to write a program that solves your math problem, then execute it. You're likely to see a lot more success using this approach.
J This user is from outside of this forum
J This user is from outside of this forum
jj4211@lemmy.world

schrieb zuletzt editiert von

#37

Also, generally the best interfaces for LLM will combine non-LLM facilities transparently. The LLM might be able to translate the prose to the format the math engine desires and then an intermediate layer recognizes a tag to submit an excerpt to a math engine and substitute the chunk with output from the math engine.

Even for servicing a request to generate an image, the text generation model runs independent of the image generation, and the intermediate layer combines them. Which can cause fun disconnects like the guy asking for a full glass of wine. The text generation half is completely oblivious to the image generation half. So it responds playing the role of a graphic artist dutifully doing the work without ever 'seeing' the image, but it assumes the image is good because that's consistent with training output, but then the user corrects it and it goes about admitting that the picture (that it never 'looked' at) was wrong and retrying the image generator with the additional context, to produce a similarly botched picture.
1 Antwort Letzte Antwort

2
P pro@programming.dev

This post did not contain any content.
J This user is from outside of this forum
J This user is from outside of this forum
jj4211@lemmy.world

schrieb zuletzt editiert von jj4211@lemmy.world

#38

They are not only unaware of their own mistakes, they are unaware of their successes. They are generating content that is, per their training corpus, consistent with the input. This gets eerie, and the 'uncanny valley' of the mistakes are all the more striking, but they are just generating content without concept of 'mistake' or' 'success' or the content being a model for something else and not just being a blend of stuff from the training data.

For example:

Me: Generate an image of a frog on a lilypad.
LLM: I'll try to create that — a peaceful frog on a lilypad in a serene pond scene. The image will appear shortly below.

<includes a perfectly credible picture of a frog on a lilypad, request successfully processed>

Me (lying): That seems to have produced a frog under a lilypad instead of on top.
LLM: Thanks for pointing that out! I'm generating a corrected version now with the frog clearly sitting on top of the lilypad. It’ll appear below shortly.

<includes another perfectly credible picture>

It didn't know anything about the picture, it just took the input at it's word. A human would have stopped to say "uhh... what do you mean, the lilypad is on water and frog is on top of that?" Or if the human were really trying to just do the request without clarification, they might have tried to think "maybe he wanted it from the perspective of a fish, and he wanted the frog underwater?". A human wouldn't have gone "you are right, I made a mistake, here I've tried again" and include almost the exact same thing.

But tha training data isn't predominantly people blatantly lying about such obvious things or second guessing things that were done so obviously normally correct.
V 1 Antwort Letzte Antwort

10
P pro@programming.dev

This post did not contain any content.
C This user is from outside of this forum
C This user is from outside of this forum
ceebee_eh@lemmy.world

schrieb zuletzt editiert von

#39

This happened to me the other day with Jippity. It outright lied to me:

"You're absolutely right. Although I don't have access to the earlier parts of the conversation".

So it says that I was right in a particular statement, but didn't actually know what I said. So I said to it, you just lied. It kept saying variations of:

"I didn't lie intentionally"

"I understand why it seems that way"

"I wasn't misleading you"

etc

It flat out lied and tried to gaslight me into thinking I was in the wrong for taking that way.
G 1 Antwort Letzte Antwort

0
S shalafi@lemmy.world

Neither are our brains.

“Brains are survival engines, not truth detectors. If self-deception promotes fitness, the brain lies. Stops noticing—irrelevant things. Truth never matters. Only fitness. By now you don’t experience the world as it exists at all. You experience a simulation built from assumptions. Shortcuts. Lies. Whole species is agnosiac by default.”

― Peter Watts, Blindsight (fiction)

Starting to think we're really not much smarter. "But LLMs tell us what we want to hear!" Been on FaceBook lately, or lemmy?

If nothing else, LLMs have woke me to how stupid humans are vs. the machines.
J This user is from outside of this forum
J This user is from outside of this forum
jj4211@lemmy.world

schrieb zuletzt editiert von

#40

It's not that they may be deceived, it's that they have no concept of what truth or fiction, mistake or success even are.

Our brains know the concepts and may fall to deceipt without recognizing it, but we at least recognize that the concept exists.

An AI generates content that is a blend of material from the training material consistent with extending the given prompt. It only seems to introduce a concept of lying or mistakes when the human injects that into the human half of the prompt material. It will also do so in a way that the human can just as easily instruct it to correct a genuine mistake as well as the human instruct it to correct something that is already correct (unless the training data includes a lot of reaffirmation of the material in the face of such doubts).

An LLM can consume more input than a human can gather in multiple lifetimes and still bo wonky in generating content, because it needs enough to credibly blend content to extend every conceivable input. It's why so many people used to judging human content get derailed by judging AI content. An AI generates a fantastic answer to an interview question that only solid humans get right, only to falter 'on the job' because the utterly generic interview question looks like millions of samples in the input, but the actual job was niche.
1 Antwort Letzte Antwort

2
P pro@programming.dev

This post did not contain any content.
M This user is from outside of this forum
M This user is from outside of this forum
melsaskca@lemmy.ca

schrieb zuletzt editiert von

#41

If you don't know you are wrong, when you have been shown to be wrong, you are not intelligent. So A.I. has become "Adequate Intelligence".
M J 2 Antworten Letzte Antwort

4
M modern_medicine_isnt@lemmy.world

It's easy, just ask the AI "are you sure"? Until it stops changing it's answer.

But seriously, LLMs are just advanced autocomplete.
J This user is from outside of this forum
J This user is from outside of this forum
jj4211@lemmy.world

schrieb zuletzt editiert von

#42

I kid you not, early on (mid 2023) some guy mentioned using ChatGPT for his work and not even checking the output (he was in some sort of non-techie field that was still in the wheelhouse of text generation). I expresssed that LLMs can include some glaring mistakes and he said he fixed it by always including in his prompt "Do not hallucinate content and verify all data is actually correct.".
P 1 Antwort Letzte Antwort

3
L lfrith@lemmy.ca

They can even get math wrong. Which surprised me. Had to tell it the answer is wrong for them to recalculate and then get the correct answer. It was simple percentages of a list of numbers I had asked.
J This user is from outside of this forum
J This user is from outside of this forum
jj4211@lemmy.world

schrieb zuletzt editiert von

#43

Fun thing, when it gets the answer right, tell it is was wrong and then see it apologize and "correct" itself to give the wrong answer.
M 1 Antwort Letzte Antwort

2
S saimen@feddit.org

I think so too, but I am really curious what will happen when we give them "bodies" with sensors so they can explore the world and make individual "experiences". I could imagine they would act much more human after a while and might even develop some kind of sentience.

Of course they would also need some kind of memory and self-actualization processes.
J This user is from outside of this forum
J This user is from outside of this forum
jj4211@lemmy.world

schrieb zuletzt editiert von

#44

Interaction with the physical world isn't really required for us to evaluate how they deal with 'experiences'. They have in principle access to all sorts of interesting experiences in the online data. Some models have been enabled to fetch internet data and add them to the prompt to help synthesize an answer.

One key thing is they don't bother until direction tells them. They don't have any desire they just have "generate search query from prompt, execute search query and fetch results, consider the combination of the original prompt and the results to be the context for generating more content and return to user".

LLM is not a scheme that credibly implies that more LLM == sapient existance. Such a concept may come, but it will be something different than LLM. LLM just looks crazily like dealing with people.
1 Antwort Letzte Antwort

3
L lodespawn@aussie.zone

Nah so their definition is the classical "how confident are you that you got the answer right". If you read the article they asked a bunch of people and 4 LLMs a bunch of random questions, then asked the respondent whether they/it had confidence their answer was correct, and then checked the answer. The LLMs initially lined up with people (over confident) but then when they iterated, shared results and asked further questions the LLMs confidence increased while people's tends to decrease to mitigate the over confidence.

But the study still assumes intelligence enough to review past results and adjust accordingly, but disregards the fact that an AI isnt intelligence, it's a word prediction model based on a data set of written text tending to infinity. It's not assessing validity of results, it's predicting what the answer is based on all previous inputs. The whole study is irrelevant.
J This user is from outside of this forum
J This user is from outside of this forum
jj4211@lemmy.world

schrieb zuletzt editiert von

#45

Well, not irrelevant. Lots of our world is trying to treat the LLM output as human-like output, so if human's are going to treat LLM output the same way they treat human generated content, then we have to characterize, for the people, how their expectations are broken in that context.

So as weird as it may seem to treat a stastical content extrapolation engine in the context of social science, there's a great deal of the reality and investment that wants to treat it as "person equivalent" output and so it must be studied in that context, if for no other reason to demonstrate to people that it should be considered "weird".
1 Antwort Letzte Antwort

1
M melsaskca@lemmy.ca

If you don't know you are wrong, when you have been shown to be wrong, you are not intelligent. So A.I. has become "Adequate Intelligence".
M This user is from outside of this forum
M This user is from outside of this forum
monkdervierte@lemmy.zip

schrieb zuletzt editiert von monkdervierte@lemmy.zip

#46

That definition seems a bit shaky. Trump & co. are mentally ill but they do have a minimum of intelligence.
1 Antwort Letzte Antwort

3
J jj4211@lemmy.world

I kid you not, early on (mid 2023) some guy mentioned using ChatGPT for his work and not even checking the output (he was in some sort of non-techie field that was still in the wheelhouse of text generation). I expresssed that LLMs can include some glaring mistakes and he said he fixed it by always including in his prompt "Do not hallucinate content and verify all data is actually correct.".
P This user is from outside of this forum
P This user is from outside of this forum
passerby6497@lemmy.world

schrieb zuletzt editiert von

#47

Ah, well then, if he tells the bot to not hallucinate and validate output there's no reason to not trust the output. After all, you told the bot not to, and we all know that self regulation works without issue all of the time.
J 1 Antwort Letzte Antwort

2
P passerby6497@lemmy.world

Ah, well then, if he tells the bot to not hallucinate and validate output there's no reason to not trust the output. After all, you told the bot not to, and we all know that self regulation works without issue all of the time.
J This user is from outside of this forum
J This user is from outside of this forum
jj4211@lemmy.world

schrieb zuletzt editiert von

#48

It gave me flashbacks when the Replit guy complained that the LLM deleted his data despite being told in all caps not to multiple times.

People really really don't understand how these things work...
M 1 Antwort Letzte Antwort

3

Anmelden zum Antworten

E

Google and IBM believe first workable quantum computer is in sight
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
19

1

68 Stimmen

19 Beiträge

35 Aufrufe

S

Þey're merely Chinese book translators. Given enough samples of "þe" used as a preposition, the chance þat thorn will be chosen in þe stochastic sequence becomes increasingly large. LLMs are being trained on data scraped from social media. Scraping, þen changing þe input data, defeats þe purpose of training and makes training worse. LLMs don't know what þey're doing. Þey don't understand. Þey consume data and parrot it by statistical probability. All I need to do is generate enough content, with distinct enough inputs, and one day someone will mistype "scan" as "sxan" and þe correlation will kick in, and statistics will produce thorns instead of "th". Will I ever produce enough content? Vanishingly small likelihood. But you gotta try
A

Palantir: As Revenues Rise, Controversy Grows
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
6

1

114 Stimmen

6 Beiträge

23 Aufrufe

T

Add arrogance to that undesirable cooking pot.
D

“On Tuesday afternoon, ChatGPT encouraged me to cut my wrists.”
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
56

1

287 Stimmen

56 Beiträge

517 Aufrufe

T

well they all did add to the discussion! they gave me something to think about
D

AI slows down some experienced software developers, study finds
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
129

1

401 Stimmen

129 Beiträge

1k Aufrufe

D

Ah. True. I realise it now.
E

Former and current Microsofties react to the latest layoffs
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
20

1

85 Stimmen

20 Beiträge

228 Aufrufe

E

Incredibly well said. And couldn't agree more! Especially after working as a game dev for Apple Arcade. We spent months proving to them their saving architecture was faulty and would lead to people losing their save file for each Apple Arcade game they play. We were ignored, and then told it was a dev problem. Cut to the launch of Arcade: every single game has several 1 star reviews about players losing their save files. This cannot be fixed by devs as it's an Apple problem, so devs have to figure out novel ways to prevent the issue from happening using their own time and resources. 1.5 years later, Apple finishes restructuring the entire backend of Arcade, fixing the problem. They tell all their devs to reimplement the saving architecture of their games to be compliant with Apples new backend or get booted from Arcade. This costs devs months of time to complete for literally zero return (Apple Arcade deals are upfront - little to no revenue is seen after launch). Apple used their trillions of dollars to ignore a massive backend issue that affected every player and developer on Apple Arcade. They then forced every dev to make an update to their game at their own expense just to keep it listed on Arcade. All while directing user frustration over the issue towards developers instead of taking accountability for launching a faulty product. Literally, these companies are run by sociopaths that have egos bigger than their paychecks. Issues like this are ignored as it's easier to place the blame on someone down the line. People like your manager end up getting promoted to the top of an office heirachy of bullshit, and everything the company makes just gets worse until whatever corpse is left is sold for parts to whatever bigger dumb company hasn't collapsed yet. It's really painful to watch, and even more painful to work with these idiots.
P

Anthropic tested Claude's(LLM, AI Chatbot) ability to manage a physical “storefront” to mixed results, as the AI struggled with pricing strategy and inventory management
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
9

1

66 Stimmen

9 Beiträge

106 Aufrufe

D

All the tasks could have been easily solved with some basic APIs and algorithms.
M

Lighter, Stronger, Smarter: The Rise of Syntactic Foams
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

16 Aufrufe

Niemand hat geantwortet
S

85K – A Melhor Opção para Quem Busca Diversão e Recompensas
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

22 Aufrufe

Niemand hat geantwortet