linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI agents wrong ~70% of time: Carnegie Mellon study

Technology

285 Beiträge 108 Kommentatoren 722 Aufrufe

M melvin_ferd@lemmy.world

What does "I give it data to put in a formulaic sentence." mean here

Why not just share the details. I often find a lot of people saying it's doing crazy things and never like to share the details. It's very similar to discussing things with Trump supporters who do the same shit when pressed on details about stuff they say occurs. Like the same "you're a troll for asking for evidence of my claim" that trumpets do. It's wild how similar it is.

And yes asking to do things like iterate over rows isn't how it works. It's getting better but that's not what it's primarily used for. It could be but isn't. It only catches so many tokens. It's getting better and has some persistence but it's nowhere near what its strength is.
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von

#275

I would be in breach of contract to tell you the details. How about you just stop trying to blame me for the clear and obvious lies that the LLM churned out and start believing that LLMs ARE are strikingly fallible, because, buddy, you have your head so far in the sand on this issue it's weird.

The solution to the problem was to realise that an LLM cannot be trusted for accuracy even if the first few results are completely accurate, the bullshit well creep in. Don't trust the LLM. Check every fucking thing.

In the end I wrote a quick script that broke the input up on tab characters and wrote the sentence. That's how formulaic it was. I regretted deeply trying to get an LLM to use data.

The frustrating thing is that it is clearly capable of doing the task some of the time, but drifting off into FANTASY is its strong suit, and it doesn't matter how firmly or how often you ask it to be accurate or use the input carefully. It's going to lie to you before long. It's an LLM. Bullshitting is what it does. Get it to do ONE THING only, then check the fuck out of its answer. Don't trust it to tell you the truth any more than you would trust Donald J Trump to.
M 1 Antwort Letzte Antwort

0
K knock_knock_lemmy_in@lemmy.world

Dunno. Asking 10 humans at random to do a task and probably one will do it better than AI. Just not as fast.
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von davidagain@lemmy.world

#276

You're better off asking one human to do the same task ten times. Humans get better and faster at things as they go along. Always slower than an LLM, but LLMs get more and more likely to veer off on some flight of fancy, further and further from reality, the more it says to you. The chances of it staying factual in the long term are really low.

It's a born bullshitter. It knows a little about a lot, but it has no clue what's real and what's made up, or it doesn't care.

If you want some text quickly, that sounds right, but you genuinely don't care whether it is right at all, go for it, use an LLM. It'll be great at that.
1 Antwort Letzte Antwort

0
E eli001@lemmy.world

This post did not contain any content.
V This user is from outside of this forum
V This user is from outside of this forum
vane@lemmy.world

schrieb zuletzt editiert von

#277

Reading with CEO mindset. 3 out of 10 employees can be fired.
1 Antwort Letzte Antwort

2
D davidagain@lemmy.world

I would be in breach of contract to tell you the details. How about you just stop trying to blame me for the clear and obvious lies that the LLM churned out and start believing that LLMs ARE are strikingly fallible, because, buddy, you have your head so far in the sand on this issue it's weird.

The solution to the problem was to realise that an LLM cannot be trusted for accuracy even if the first few results are completely accurate, the bullshit well creep in. Don't trust the LLM. Check every fucking thing.

In the end I wrote a quick script that broke the input up on tab characters and wrote the sentence. That's how formulaic it was. I regretted deeply trying to get an LLM to use data.

The frustrating thing is that it is clearly capable of doing the task some of the time, but drifting off into FANTASY is its strong suit, and it doesn't matter how firmly or how often you ask it to be accurate or use the input carefully. It's going to lie to you before long. It's an LLM. Bullshitting is what it does. Get it to do ONE THING only, then check the fuck out of its answer. Don't trust it to tell you the truth any more than you would trust Donald J Trump to.
M This user is from outside of this forum
M This user is from outside of this forum
melvin_ferd@lemmy.world

schrieb zuletzt editiert von melvin_ferd@lemmy.world

#278

This is crazy. I've literally been saying they are fallible. You're saying your professional fed and LLM some type of dataset. So I can't really say what it was you're trying to accomplish but I'm just arguing that trying to have it process data is not what they're trained to do. LLM are incredible tools and I'm tired of trying to act like they're not because people keep using them for things they're not built to do. It's not a fire and forget thing. It does need to be supervised and verified. It's not exactly an answer machine. But it's so good at parsing text and documents, summarizing, formatting and acting like a search engine that you can communicate with rather than trying to grok some arcane sentence. Its power is in language applications.

It is so much fun to just play around with and figure out where it can help. I'm constantly doing things on my computer it's great for instructions. Especially if I get a problem that's kind of unique and needs a big of discussion to solve.
D 1 Antwort Letzte Antwort

0
M melvin_ferd@lemmy.world

This is crazy. I've literally been saying they are fallible. You're saying your professional fed and LLM some type of dataset. So I can't really say what it was you're trying to accomplish but I'm just arguing that trying to have it process data is not what they're trained to do. LLM are incredible tools and I'm tired of trying to act like they're not because people keep using them for things they're not built to do. It's not a fire and forget thing. It does need to be supervised and verified. It's not exactly an answer machine. But it's so good at parsing text and documents, summarizing, formatting and acting like a search engine that you can communicate with rather than trying to grok some arcane sentence. Its power is in language applications.

It is so much fun to just play around with and figure out where it can help. I'm constantly doing things on my computer it's great for instructions. Especially if I get a problem that's kind of unique and needs a big of discussion to solve.
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von davidagain@lemmy.world

#279

it’s so good at parsing text and documents, summarizing

No. Not when it matters. It makes stuff up. The less you carefully check every single fucking thing it says, the more likely you are to believe some lies it subtly slipped in as it went along. If truth doesn't matter, go ahead and use LLMs.

If you just want some ideas that you're going to sift through, independently verify and check for yourself with extreme skepticism as if Donald Trump were telling you how to achieve world peace, great, you're using LLMs effectively.

But if you're trusting it, you're doing it very, very wrong and you're going to get humiliated because other people are going to catch you out in repeating an LLM's bullshit.
M 1 Antwort Letzte Antwort

0
D davidagain@lemmy.world

it’s so good at parsing text and documents, summarizing

No. Not when it matters. It makes stuff up. The less you carefully check every single fucking thing it says, the more likely you are to believe some lies it subtly slipped in as it went along. If truth doesn't matter, go ahead and use LLMs.

If you just want some ideas that you're going to sift through, independently verify and check for yourself with extreme skepticism as if Donald Trump were telling you how to achieve world peace, great, you're using LLMs effectively.

But if you're trusting it, you're doing it very, very wrong and you're going to get humiliated because other people are going to catch you out in repeating an LLM's bullshit.
M This user is from outside of this forum
M This user is from outside of this forum
melvin_ferd@lemmy.world

schrieb zuletzt editiert von

#280

If it's so bad as if you say, could you give an example of a prompt where it'll tell you incorrect information.
D 1 Antwort Letzte Antwort

0
M melvin_ferd@lemmy.world

If it's so bad as if you say, could you give an example of a prompt where it'll tell you incorrect information.
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von

#281

It's like you didn't listen to anything I ever said, or you discounted everything I said as fiction, but everything your dear LLM said is gospel truth in your eyes. It's utterly irrational. You have to be trolling me now.
M 1 Antwort Letzte Antwort

0
D davidagain@lemmy.world

It's like you didn't listen to anything I ever said, or you discounted everything I said as fiction, but everything your dear LLM said is gospel truth in your eyes. It's utterly irrational. You have to be trolling me now.
M This user is from outside of this forum
M This user is from outside of this forum
melvin_ferd@lemmy.world

schrieb zuletzt editiert von

#282

Should be easy if it's that bad though
D 1 Antwort Letzte Antwort

0
M melvin_ferd@lemmy.world

Should be easy if it's that bad though
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von davidagain@lemmy.world

#283

I already told you my experience of the crapness of LLMs and even explained why I can't share the prompt etc. You clearly weren't listening or are incapable of taking in information.

There's also all the testing done by the people talked about in the article we're discussing which you're also irrationally dismissing.

You have extreme confirmation bias.

Everything you hear that disagrees with your absurd faith in the accuracy of the extreme blagging of LLMs gets dismissed for any excuse you can come up with.
M 1 Antwort Letzte Antwort

0
D davidagain@lemmy.world

I already told you my experience of the crapness of LLMs and even explained why I can't share the prompt etc. You clearly weren't listening or are incapable of taking in information.

There's also all the testing done by the people talked about in the article we're discussing which you're also irrationally dismissing.

You have extreme confirmation bias.

Everything you hear that disagrees with your absurd faith in the accuracy of the extreme blagging of LLMs gets dismissed for any excuse you can come up with.
M This user is from outside of this forum
M This user is from outside of this forum
melvin_ferd@lemmy.world

schrieb zuletzt editiert von

#284

You're projecting here. I'm asking you to give an example of any prompt. You're saying it's so bad that it needs to be babysat because it's errors. I'll only asking for your to give an example and you're saying that's confirmation bias and acting like I'm being religiously ignorant
D 1 Antwort Letzte Antwort

0
M melvin_ferd@lemmy.world

You're projecting here. I'm asking you to give an example of any prompt. You're saying it's so bad that it needs to be babysat because it's errors. I'll only asking for your to give an example and you're saying that's confirmation bias and acting like I'm being religiously ignorant
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von

#285

This is you
1 Antwort Letzte Antwort

0

Anmelden zum Antworten

J

Big Tech Execs Commissioned into the Army [16:52 | JUL 03 2025 | Glenn Greenwald]
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
6

4 Stimmen

6 Beiträge

29 Aufrufe

J

I wonder! They may be labeled as contractors or similar to a merc. Third-party contractors that don't have to follow the same 'rules' as government or military personnel. Edit: Word, merchs to merc, meaning mercenary
D

A weaponized AI chatbot is flooding city councils with climate misinformation
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

4 Stimmen

1 Beiträge

10 Aufrufe

Niemand hat geantwortet
T

GitHub is Leaking Trump’s Plans to 'Accelerate' AI Across Government
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
15

1

281 Stimmen

15 Beiträge

59 Aufrufe

F

Magats wanted people with their same mental capacity to run things and oh look, it’s lots of incompetence
A

The world could experience a year above 2°C of warming by 2029
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
17

1

200 Stimmen

17 Beiträge

75 Aufrufe

S

Thank you for the clarification.
C

"Weakening encryption undermines ProtectEU's objectives" – experts slams EU plan to create an encryption backdoor, again
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
10

1

229 Stimmen

10 Beiträge

39 Aufrufe

Z

I'm having a hard time believing the EU cant afford a $5 wrench for decryption
P

Business Insider is tracking employees’ ChatGPT usage as part of a new AI push: An enterprise version of ChatGPT is now available to all staff, with 70% using the tool “regularly.”
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

1 Stimmen

3 Beiträge

5 Aufrufe

B

They’re trash because the entire rag is right-wing billionaire propaganda by design.
A

HMD, Lava to launch feature phones with direct-to-mobile technology, Developed in collaboration with Tejas Networks and powered by Saankhya's chipset, these phones can stream content without internet
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
6

9 Stimmen

6 Beiträge

11 Aufrufe

N

So they.just reinvented the DVB-T tuner. Edit: I looked it up and it's literally just that. The fact they're shoving it into feature phones is interesting.
J

I installed Linux on this 8-inch mini laptop, and it's my new favorite way of computing
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
14

1

48 Stimmen

14 Beiträge

40 Aufrufe

B

Take a longer text (like 70 pages or so) and try to delete the first 30 pages.