Skip to content

AI Chatbots Remain Overconfident — Even When They’re Wrong: Large Language Models appear to be unaware of their own mistakes, prompting concerns about common uses for AI chatbots.

Technology
66 41 0
  • Large language models aren’t designed to be knowledge machines - they’re designed to generate natural-sounding language, nothing more. The fact that they ever get things right is just a byproduct of their training data containing a lot of correct information. These systems aren’t generally intelligent, and people need to stop treating them as if they are. Complaining that an LLM gives out wrong information isn’t a failure of the model itself - it’s a mismatch of expectations.

    Neither are our brains.

    “Brains are survival engines, not truth detectors. If self-deception promotes fitness, the brain lies. Stops noticing—irrelevant things. Truth never matters. Only fitness. By now you don’t experience the world as it exists at all. You experience a simulation built from assumptions. Shortcuts. Lies. Whole species is agnosiac by default.”

    ― Peter Watts, Blindsight (fiction)

    Starting to think we're really not much smarter. "But LLMs tell us what we want to hear!" Been on FaceBook lately, or lemmy?

    If nothing else, LLMs have woke me to how stupid humans are vs. the machines.

  • Sounds pretty human to me. /s

    Sounds pretty human to me. no /s

  • I guess, but it's like proving your phones predictive text has confidence in its suggestions regardless of accuracy. Confidence is not an attribute of a math function, they are attributing intelligence to a predictive model.

    I work in risk management, but don't really have a strong understanding of LLM mechanics. "Confidence" is something that i quantify in my work, but it has different terms that are associated with it. In modeling outcomes, I may say that we have 60% confidence in achieving our budget objectives, while others would express the same result by saying our chances of achieving our budget objective are 60%. Again, I'm not sure if this is what the LLM is doing, but if it is producing a modeled prediction with a CDF of possible outcomes, then representing its result with 100% confindence means that the LLM didn't model any other possible outcomes other than the answer it is providing, which does seem troubling.

  • People really do not like seeing opposing viewpoints, eh? There's disagreeing, and then there's downvoting to oblivion without even engaging in a discussion, haha.

    Even if they're probably right, in such murky uncertain waters where we're not experts, one should have at least a little open mind, or live and let live.

    It's like talking with someone who thinks the Earth is flat. There isn't anything to discuss. They're objectively wrong.

    Humans like to anthropomorphize everything. It's why you can see a face on a car's front grille. LLMs are ultra advanced pattern matching algorithms. They do not think or reason or have any kind of opinion or sentience, yet they are being utilized as if they do. Let's see how it works out for the world, I guess.

  • It's easy, just ask the AI "are you sure"? Until it stops changing it's answer.

    But seriously, LLMs are just advanced autocomplete.

    They can even get math wrong. Which surprised me. Had to tell it the answer is wrong for them to recalculate and then get the correct answer. It was simple percentages of a list of numbers I had asked.

  • Neither are our brains.

    “Brains are survival engines, not truth detectors. If self-deception promotes fitness, the brain lies. Stops noticing—irrelevant things. Truth never matters. Only fitness. By now you don’t experience the world as it exists at all. You experience a simulation built from assumptions. Shortcuts. Lies. Whole species is agnosiac by default.”

    ― Peter Watts, Blindsight (fiction)

    Starting to think we're really not much smarter. "But LLMs tell us what we want to hear!" Been on FaceBook lately, or lemmy?

    If nothing else, LLMs have woke me to how stupid humans are vs. the machines.

    There are plenty of similarities in the output of both the human brain and LLMs, but overall they’re very different. Unlike LLMs, the human brain is generally intelligent - it can adapt to a huge variety of cognitive tasks. LLMs, on the other hand, can only do one thing: generate language. It’s tempting to anthropomorphize systems like ChatGPT because of how competent they seem, but there’s no actual thinking going on. It’s just generating language based on patterns and probabilities.

  • I work in risk management, but don't really have a strong understanding of LLM mechanics. "Confidence" is something that i quantify in my work, but it has different terms that are associated with it. In modeling outcomes, I may say that we have 60% confidence in achieving our budget objectives, while others would express the same result by saying our chances of achieving our budget objective are 60%. Again, I'm not sure if this is what the LLM is doing, but if it is producing a modeled prediction with a CDF of possible outcomes, then representing its result with 100% confindence means that the LLM didn't model any other possible outcomes other than the answer it is providing, which does seem troubling.

    Nah so their definition is the classical "how confident are you that you got the answer right". If you read the article they asked a bunch of people and 4 LLMs a bunch of random questions, then asked the respondent whether they/it had confidence their answer was correct, and then checked the answer. The LLMs initially lined up with people (over confident) but then when they iterated, shared results and asked further questions the LLMs confidence increased while people's tends to decrease to mitigate the over confidence.

    But the study still assumes intelligence enough to review past results and adjust accordingly, but disregards the fact that an AI isnt intelligence, it's a word prediction model based on a data set of written text tending to infinity. It's not assessing validity of results, it's predicting what the answer is based on all previous inputs. The whole study is irrelevant.

  • This Nobel Prize winner and subject matter expert takes the opposite view

    Interesting talk but the number of times he completely dismisses the entire field of linguistics kind of makes me think he's being disingenuous about his familiarity with it.

    For one, I think he is dismissing holotes, the concept of "wholeness." That when you cut something apart to it's individual parts, you lose something about the bigger picture. This deconstruction of language misses the larger picture of the human body as a whole, and how every part of us, from our assemblage of organs down to our DNA, impact how we interact with and understand the world. He may have a great definition of understanding but it still sounds (to me) like it's potentially missing aspects of human/animal biologically based understanding.

    For example, I have cancer, and about six months before I was diagnosed, I had begun to get more chronically depressed than usual. I felt hopeless and I didn't know why. Surprisingly, that's actually a symptom of my cancer. What understanding did I have that changed how I felt inside and how I understood the things around me? Suddenly I felt different about words and ideas, but nothing had changed externally, something had change internally. The connections in my neural network had adjusted, the feelings and associations with words and ideas was different, but I hadn't done anything to make that adjustment. No learning or understanding had happened. I had a mutation in my DNA that made that adjustment for me.

    Further, I think he's deeply misunderstanding (possibly intentionally?) what linguists like Chomsky are saying when they say humans are born with language. They mean that we are born with a genetic blueprint to understand language. Just like animals are born with a genetic blueprint to do things they were never trained to do. Many animals are born and almost immediately stand up to walk. This is the same principle. There are innate biologically ingrained understandings that help us along the path to understanding. It does not mean we are born understanding language as much as we are born with the building blocks of understanding the physical world in which we exist.

    Anyway, interesting talk, but I immediately am skeptical of anyone who wholly dismisses an entire field of thought so casually.

    For what it's worth, I didn't downvote you and I'm sorry people are doing so.

  • They can even get math wrong. Which surprised me. Had to tell it the answer is wrong for them to recalculate and then get the correct answer. It was simple percentages of a list of numbers I had asked.

    Language models are unsuitable for math problems broadly speaking. We already have good technology solutions for that category of problems. Luckily, you can combine the two - prompt the model to write a program that solves your math problem, then execute it. You're likely to see a lot more success using this approach.

  • This post did not contain any content.

    What a terrible headline. Self-aware? Really?

  • People really do not like seeing opposing viewpoints, eh? There's disagreeing, and then there's downvoting to oblivion without even engaging in a discussion, haha.

    Even if they're probably right, in such murky uncertain waters where we're not experts, one should have at least a little open mind, or live and let live.

    I think there's two basic mistakes that you made. First, you think that we aren't experts, but it's definitely true that some of us have studied these topics for years in college or graduate school, and surely many other people are well read on the subject. Obviously you can't easily confirm our backgrounds, but we exist. Second, people who are somewhat aware of the topic might realize that it's not particularly productive to engage in discussion on it here because there's too much background information that's missing. It's often the case that experts don't try to discuss things because it's the wrong venue, not because they feel superior.

  • Neither are our brains.

    “Brains are survival engines, not truth detectors. If self-deception promotes fitness, the brain lies. Stops noticing—irrelevant things. Truth never matters. Only fitness. By now you don’t experience the world as it exists at all. You experience a simulation built from assumptions. Shortcuts. Lies. Whole species is agnosiac by default.”

    ― Peter Watts, Blindsight (fiction)

    Starting to think we're really not much smarter. "But LLMs tell us what we want to hear!" Been on FaceBook lately, or lemmy?

    If nothing else, LLMs have woke me to how stupid humans are vs. the machines.

    Every thread about LLMs has to have some guy like yourself saying how LLMs are like humans and smarter than humans for some reason.

  • This post did not contain any content.

    Is that a recycled piece from 2023? Because we already knew that.

  • This post did not contain any content.

    Oh shit, they do behave like humans after all.

  • Neither are our brains.

    “Brains are survival engines, not truth detectors. If self-deception promotes fitness, the brain lies. Stops noticing—irrelevant things. Truth never matters. Only fitness. By now you don’t experience the world as it exists at all. You experience a simulation built from assumptions. Shortcuts. Lies. Whole species is agnosiac by default.”

    ― Peter Watts, Blindsight (fiction)

    Starting to think we're really not much smarter. "But LLMs tell us what we want to hear!" Been on FaceBook lately, or lemmy?

    If nothing else, LLMs have woke me to how stupid humans are vs. the machines.

  • This post did not contain any content.

    prompting concerns

    Oh you.

  • It's easy, just ask the AI "are you sure"? Until it stops changing it's answer.

    But seriously, LLMs are just advanced autocomplete.

    Ah, the monte-carlo approach to truth.

  • They can even get math wrong. Which surprised me. Had to tell it the answer is wrong for them to recalculate and then get the correct answer. It was simple percentages of a list of numbers I had asked.

    I once gave some kind of math problem (how to break down a certain amount of money into bills) and the llm wrote a python script for it, ran it and thus gave me the correct answer. Kind of clever really.

  • It's like talking with someone who thinks the Earth is flat. There isn't anything to discuss. They're objectively wrong.

    Humans like to anthropomorphize everything. It's why you can see a face on a car's front grille. LLMs are ultra advanced pattern matching algorithms. They do not think or reason or have any kind of opinion or sentience, yet they are being utilized as if they do. Let's see how it works out for the world, I guess.

    I think so too, but I am really curious what will happen when we give them "bodies" with sensors so they can explore the world and make individual "experiences". I could imagine they would act much more human after a while and might even develop some kind of sentience.

    Of course they would also need some kind of memory and self-actualization processes.

  • Every thread about LLMs has to have some guy like yourself saying how LLMs are like humans and smarter than humans for some reason.

    Some humans are not as smart as LLMs, I give them that.

  • EU Gives Platforms 12 Months to Deploy 'Strict' Age Verification

    Technology technology
    9
    2
    45 Stimmen
    9 Beiträge
    0 Aufrufe
    cygnus@lemmy.caC
    I hate that they get to label this a "hack". It was sheer negligence - they stored these images in an unsecured bucket.
  • Elon Musk's X slams French criminal investigation

    Technology technology
    10
    1
    51 Stimmen
    10 Beiträge
    101 Aufrufe
    B
    Actually there was just yesterday a story about Corning (The maker of Gorilla glass), that was accused by EU for anti competitive behavior, where Corning entered in positive dialogue, and stated they intended to work fully within regulation. https://lemmy.world/post/33255689 Corning, the US-based manufacturer of Gorilla Glass, has successfully avoided potential European Union antitrust fines of up to $1.25 billion by agreeing to a set of legally binding commitments that address concerns over its exclusive supply agreements for specialty glass used in smartphones and other handheld devices. So yes Musk is an ass, also compared to other companies. And his reaction is confrontational, which is not normal behavior.
  • YouTube's Latest Update Shows That Online Monoculture Is Dead

    Technology technology
    124
    1
    263 Stimmen
    124 Beiträge
    900 Aufrufe
    S
    Then all hope is lost and there is absolutely no point in fighting, all it will do is annoy people who try to read your messages. If writing weird can have an impact on the world, I'm sure a lot of other things can too.
  • 1 Stimmen
    1 Beiträge
    13 Aufrufe
    Niemand hat geantwortet
  • 52 Stimmen
    2 Beiträge
    22 Aufrufe
    kolanaki@pawb.socialK
    Same. That's probably why I suck ass at math, but my spatial awareness is off the chart. 🫠
  • 93 Stimmen
    35 Beiträge
    135 Aufrufe
    D
    Same as American companies. Send you targeted ads and news articles to influence your world view as a form of new soft power.
  • Catbox.moe got screwed 😿

    Technology technology
    40
    55 Stimmen
    40 Beiträge
    257 Aufrufe
    archrecord@lemm.eeA
    I'll gladly give you a reason. I'm actually happy to articulate my stance on this, considering how much I tend to care about digital rights. Services that host files should not be held responsible for what users upload, unless: The service explicitly caters to illegal content by definition or practice (i.e. the if the website is literally titled uploadyourcsamhere[.]com then it's safe to assume they deliberately want to host illegal content) The service has a very easy mechanism to remove illegal content, either when asked, or through simple monitoring systems, but chooses not to do so (catbox does this, and quite quickly too) Because holding services responsible creates a whole host of negative effects. Here's some examples: Someone starts a CDN and some users upload CSAM. The creator of the CDN goes to jail now. Nobody ever wants to create a CDN because of the legal risk, and thus the only providers of CDNs become shady, expensive, anonymously-run services with no compliance mechanisms. You run a site that hosts images, and someone decides they want to harm you. They upload CSAM, then report the site to law enforcement. You go to jail. Anybody in the future who wants to run an image sharing site must now self-censor to try and not upset any human being that could be willing to harm them via their site. A social media site is hosting the posts and content of users. In order to be compliant and not go to jail, they must engage in extremely strict filtering, otherwise even one mistake could land them in jail. All users of the site are prohibited from posting any NSFW or even suggestive content, (including newsworthy media, such as an image of bodies in a warzone) and any violation leads to an instant ban, because any of those things could lead to a chance of actually illegal content being attached. This isn't just my opinion either. Digital rights organizations such as the Electronic Frontier Foundation have talked at length about similar policies before. To quote them: "When social media platforms adopt heavy-handed moderation policies, the unintended consequences can be hard to predict. For example, Twitter’s policies on sexual material have resulted in posts on sexual health and condoms being taken down. YouTube’s bans on violent content have resulted in journalism on the Syrian war being pulled from the site. It can be tempting to attempt to “fix” certain attitudes and behaviors online by placing increased restrictions on users’ speech, but in practice, web platforms have had more success at silencing innocent people than at making online communities healthier." Now, to address the rest of your comment, since I don't just want to focus on the beginning: I think you have to actively moderate what is uploaded Catbox does, and as previously mentioned, often at a much higher rate than other services, and at a comparable rate to many services that have millions, if not billions of dollars in annual profits that could otherwise be spent on further moderation. there has to be swifter and stricter punishment for those that do upload things that are against TOS and/or illegal. The problem isn't necessarily the speed at which people can be reported and punished, but rather that the internet is fundamentally harder to track people on than real life. It's easy for cops to sit around at a spot they know someone will be physically distributing illegal content at in real life, but digitally, even if you can see the feed of all the information passing through the service, a VPN or Tor connection will anonymize your IP address in a manner that most police departments won't be able to track, and most three-letter agencies will simply have a relatively low success rate with. There's no good solution to this problem of identifying perpetrators, which is why platforms often focus on moderation over legal enforcement actions against users so frequently. It accomplishes the goal of preventing and removing the content without having to, for example, require every single user of the internet to scan an ID (and also magically prevent people from just stealing other people's access tokens and impersonating their ID) I do agree, however, that we should probably provide larger amounts of funding, training, and resources, to divisions who's sole goal is to go after online distribution of various illegal content, primarily that which harms children, because it's certainly still an issue of there being too many reports to go through, even if many of them will still lead to dead ends. I hope that explains why making file hosting services liable for user uploaded content probably isn't the best strategy. I hate to see people with good intentions support ideas that sound good in practice, but in the end just cause more untold harms, and I hope you can understand why I believe this to be the case.
  • Lapsus$: GTA 6 hacker sentenced to life in hospital prison

    Technology technology
    2
    1
    0 Stimmen
    2 Beiträge
    26 Aufrufe
    A
    From Aurealisa perspective, the leak may have been a momentary distraction, but it had serious consequences for the person behind it. Hopefully, this serves as a lesson to other potential hackers.