linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI agents wrong ~70% of time: Carnegie Mellon study

Technology

280 Beiträge 108 Kommentatoren 245 Aufrufe

U upgrayedd1776@sh.itjust.works

sounds like the fault of the researchers not to build better tests or understand the limits of the software to use it right
R This user is from outside of this forum
R This user is from outside of this forum
rekorse@sh.itjust.works

schrieb zuletzt editiert von

#268

Are you arguing they should have built a test that makes AI perform better? How are you offended on behalf of AI?
1 Antwort Letzte Antwort

4
T timeworntraveler@lemmy.dbzer0.com

you're right, the dumb of AI is completely comparable to the dumb of human, there's no difference worth talking about, sorry i even spoke the fuck up
T This user is from outside of this forum
T This user is from outside of this forum
tja@programming.dev

schrieb zuletzt editiert von

#269

No worries.
1 Antwort Letzte Antwort

0
E eli001@lemmy.world

This post did not contain any content.
S This user is from outside of this forum
S This user is from outside of this forum
sircac@lemmy.world

schrieb zuletzt editiert von

#270

Why would they be right beyond word sequence frecuencies?
1 Antwort Letzte Antwort

0
M melvin_ferd@lemmy.world

There's a sleep button on my laptop. Doesn't mean I would use it.

I'm just trying to say you're saying the feature that everyone kind of knows doesn't work. Chatgpt is not trained to do calculations well.

I just like technology and I think and fully believe the left hatred of it is not logical. I believe it stems from a lot of media be and headlines. Why there's this push From media is a question I would like to know more. But overall, I see a lot of the same makers of bullshit yellow journalism for this stuff on the left as I do for similar bullshit on the right wing spaces towards other things.
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von

#271

Again with dismissing the evidence of my own eyes!

I wasn't asking it to do calculations, I was asking it to put the data into a super formulaic sentence. It was good at the first couple of rows then it would get stuck in a rut and start lying. It was crap. A seven year old would have done it far better, and if I'd told a seven year old that they had made a couple of mistakes and to check it carefully, they would have done.

Again, I didn't read it in a fucking article, I read it on my fucking computer screen, so if you'd stop fucking telling me I'm stupid for using it the way it fucking told me I could use it, or that I'm stupid for believing what the media tell me about LLMs, when all I'm doing is telling you my own experience, you'd sound a lot less like a desperate troll or someone who is completely unable to assimilate new information that differs from your dogma.
M 1 Antwort Letzte Antwort

0
K knock_knock_lemmy_in@lemmy.world

That looks better. Even with a fair coin, 10 heads in a row is almost impossible.

And if you are feeding the output back into a new instance of a model then the quality is highly likely to degrade.
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von

#272

Whereas if you ask a human to do the same thing ten times, the probability that they get all ten right is astronomically higher than 0.0000059049.
K 1 Antwort Letzte Antwort

0
D davidagain@lemmy.world

Again with dismissing the evidence of my own eyes!

I wasn't asking it to do calculations, I was asking it to put the data into a super formulaic sentence. It was good at the first couple of rows then it would get stuck in a rut and start lying. It was crap. A seven year old would have done it far better, and if I'd told a seven year old that they had made a couple of mistakes and to check it carefully, they would have done.

Again, I didn't read it in a fucking article, I read it on my fucking computer screen, so if you'd stop fucking telling me I'm stupid for using it the way it fucking told me I could use it, or that I'm stupid for believing what the media tell me about LLMs, when all I'm doing is telling you my own experience, you'd sound a lot less like a desperate troll or someone who is completely unable to assimilate new information that differs from your dogma.
M This user is from outside of this forum
M This user is from outside of this forum
melvin_ferd@lemmy.world

schrieb zuletzt editiert von melvin_ferd@lemmy.world

#273

What does "I give it data to put in a formulaic sentence." mean here

Why not just share the details. I often find a lot of people saying it's doing crazy things and never like to share the details. It's very similar to discussing things with Trump supporters who do the same shit when pressed on details about stuff they say occurs. Like the same "you're a troll for asking for evidence of my claim" that trumpets do. It's wild how similar it is.

And yes asking to do things like iterate over rows isn't how it works. It's getting better but that's not what it's primarily used for. It could be but isn't. It only catches so many tokens. It's getting better and has some persistence but it's nowhere near what its strength is.
D 1 Antwort Letzte Antwort

0
D davidagain@lemmy.world

Whereas if you ask a human to do the same thing ten times, the probability that they get all ten right is astronomically higher than 0.0000059049.
K This user is from outside of this forum
K This user is from outside of this forum
knock_knock_lemmy_in@lemmy.world

schrieb zuletzt editiert von

#274

Dunno. Asking 10 humans at random to do a task and probably one will do it better than AI. Just not as fast.
D 1 Antwort Letzte Antwort

0
M melvin_ferd@lemmy.world

What does "I give it data to put in a formulaic sentence." mean here

Why not just share the details. I often find a lot of people saying it's doing crazy things and never like to share the details. It's very similar to discussing things with Trump supporters who do the same shit when pressed on details about stuff they say occurs. Like the same "you're a troll for asking for evidence of my claim" that trumpets do. It's wild how similar it is.

And yes asking to do things like iterate over rows isn't how it works. It's getting better but that's not what it's primarily used for. It could be but isn't. It only catches so many tokens. It's getting better and has some persistence but it's nowhere near what its strength is.
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von

#275

I would be in breach of contract to tell you the details. How about you just stop trying to blame me for the clear and obvious lies that the LLM churned out and start believing that LLMs ARE are strikingly fallible, because, buddy, you have your head so far in the sand on this issue it's weird.

The solution to the problem was to realise that an LLM cannot be trusted for accuracy even if the first few results are completely accurate, the bullshit well creep in. Don't trust the LLM. Check every fucking thing.

In the end I wrote a quick script that broke the input up on tab characters and wrote the sentence. That's how formulaic it was. I regretted deeply trying to get an LLM to use data.

The frustrating thing is that it is clearly capable of doing the task some of the time, but drifting off into FANTASY is its strong suit, and it doesn't matter how firmly or how often you ask it to be accurate or use the input carefully. It's going to lie to you before long. It's an LLM. Bullshitting is what it does. Get it to do ONE THING only, then check the fuck out of its answer. Don't trust it to tell you the truth any more than you would trust Donald J Trump to.
M 1 Antwort Letzte Antwort

0
K knock_knock_lemmy_in@lemmy.world

Dunno. Asking 10 humans at random to do a task and probably one will do it better than AI. Just not as fast.
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von davidagain@lemmy.world

#276

You're better off asking one human to do the same task ten times. Humans get better and faster at things as they go along. Always slower than an LLM, but LLMs get more and more likely to veer off on some flight of fancy, further and further from reality, the more it says to you. The chances of it staying factual in the long term are really low.

It's a born bullshitter. It knows a little about a lot, but it has no clue what's real and what's made up, or it doesn't care.

If you want some text quickly, that sounds right, but you genuinely don't care whether it is right at all, go for it, use an LLM. It'll be great at that.
1 Antwort Letzte Antwort

0
E eli001@lemmy.world

This post did not contain any content.
V This user is from outside of this forum
V This user is from outside of this forum
vane@lemmy.world

schrieb zuletzt editiert von

#277

Reading with CEO mindset. 3 out of 10 employees can be fired.
1 Antwort Letzte Antwort

1
D davidagain@lemmy.world

I would be in breach of contract to tell you the details. How about you just stop trying to blame me for the clear and obvious lies that the LLM churned out and start believing that LLMs ARE are strikingly fallible, because, buddy, you have your head so far in the sand on this issue it's weird.

The solution to the problem was to realise that an LLM cannot be trusted for accuracy even if the first few results are completely accurate, the bullshit well creep in. Don't trust the LLM. Check every fucking thing.

In the end I wrote a quick script that broke the input up on tab characters and wrote the sentence. That's how formulaic it was. I regretted deeply trying to get an LLM to use data.

The frustrating thing is that it is clearly capable of doing the task some of the time, but drifting off into FANTASY is its strong suit, and it doesn't matter how firmly or how often you ask it to be accurate or use the input carefully. It's going to lie to you before long. It's an LLM. Bullshitting is what it does. Get it to do ONE THING only, then check the fuck out of its answer. Don't trust it to tell you the truth any more than you would trust Donald J Trump to.
M This user is from outside of this forum
M This user is from outside of this forum
melvin_ferd@lemmy.world

schrieb zuletzt editiert von melvin_ferd@lemmy.world

#278

This is crazy. I've literally been saying they are fallible. You're saying your professional fed and LLM some type of dataset. So I can't really say what it was you're trying to accomplish but I'm just arguing that trying to have it process data is not what they're trained to do. LLM are incredible tools and I'm tired of trying to act like they're not because people keep using them for things they're not built to do. It's not a fire and forget thing. It does need to be supervised and verified. It's not exactly an answer machine. But it's so good at parsing text and documents, summarizing, formatting and acting like a search engine that you can communicate with rather than trying to grok some arcane sentence. Its power is in language applications.

It is so much fun to just play around with and figure out where it can help. I'm constantly doing things on my computer it's great for instructions. Especially if I get a problem that's kind of unique and needs a big of discussion to solve.
D 1 Antwort Letzte Antwort

0
M melvin_ferd@lemmy.world

This is crazy. I've literally been saying they are fallible. You're saying your professional fed and LLM some type of dataset. So I can't really say what it was you're trying to accomplish but I'm just arguing that trying to have it process data is not what they're trained to do. LLM are incredible tools and I'm tired of trying to act like they're not because people keep using them for things they're not built to do. It's not a fire and forget thing. It does need to be supervised and verified. It's not exactly an answer machine. But it's so good at parsing text and documents, summarizing, formatting and acting like a search engine that you can communicate with rather than trying to grok some arcane sentence. Its power is in language applications.

It is so much fun to just play around with and figure out where it can help. I'm constantly doing things on my computer it's great for instructions. Especially if I get a problem that's kind of unique and needs a big of discussion to solve.
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von davidagain@lemmy.world

#279

it’s so good at parsing text and documents, summarizing

No. Not when it matters. It makes stuff up. The less you carefully check every single fucking thing it says, the more likely you are to believe some lies it subtly slipped in as it went along. If truth doesn't matter, go ahead and use LLMs.

If you just want some ideas that you're going to sift through, independently verify and check for yourself with extreme skepticism as if Donald Trump were telling you how to achieve world peace, great, you're using LLMs effectively.

But if you're trusting it, you're doing it very, very wrong and you're going to get humiliated because other people are going to catch you out in repeating an LLM's bullshit.
M 1 Antwort Letzte Antwort

0
D davidagain@lemmy.world

it’s so good at parsing text and documents, summarizing

No. Not when it matters. It makes stuff up. The less you carefully check every single fucking thing it says, the more likely you are to believe some lies it subtly slipped in as it went along. If truth doesn't matter, go ahead and use LLMs.

If you just want some ideas that you're going to sift through, independently verify and check for yourself with extreme skepticism as if Donald Trump were telling you how to achieve world peace, great, you're using LLMs effectively.

But if you're trusting it, you're doing it very, very wrong and you're going to get humiliated because other people are going to catch you out in repeating an LLM's bullshit.
M This user is from outside of this forum
M This user is from outside of this forum
melvin_ferd@lemmy.world

schrieb zuletzt editiert von

#280

If it's so bad as if you say, could you give an example of a prompt where it'll tell you incorrect information.
1 Antwort Letzte Antwort

0

Anmelden zum Antworten

D

Samsung phones can survive twice as many charges as Pixel and iPhone, according to EU data
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
127

1

568 Stimmen

127 Beiträge

385 Aufrufe

T

They also bundle twice as much crapware
E

Hacker Tactic: ESD Diodes
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

24 Stimmen

1 Beiträge

10 Aufrufe

Niemand hat geantwortet
T

NO KINGS! Tomorrow on Trump's birthday, we protest across the entire nation. Check the website for No Kings events near you!
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
13

2

179 Stimmen

13 Beiträge

12 Aufrufe

S

I will be there. I will be armed. I will carry a gas mask. I will carry water and medical for my compatriots. I will not start shit. I will fight back if it comes to it.
P

Android 16 is here
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
73

1

145 Stimmen

73 Beiträge

188 Aufrufe

B

[image: be056f6c-6ffe-4ecf-a137-9af60aef4d90.png] You people are getting updates? I really hate that I cannot just do everything with the pocket computer I own that is running a supposedly free operating system.
P

lemm.ee is shutting down at the end of this month
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
130

625 Stimmen

130 Beiträge

334 Aufrufe

V

If I know correctly, it is not possible to export posts, comments, replies.
A

Prototype of RTX 5090 Appears With Four 16-Pin Power Connectors, Capable of Delivering 2,400W
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
67

1

131 Stimmen

67 Beiträge

225 Aufrufe

I

Arcing causes more fires, because over current caused all the fires until we tightened standards and dual-mode circuit breakers. Now fires are caused by loose connections arcing, and damaged wires arcing to flammable material. Breakers are specifically designed for a sustained current, but arcing is dangerous because it tends to cascade, light arcing damages contacts, leading to more arcing in a cycle. The real danger of arcing is that it can happen outside of view, and start fires that aren't caught till everything burns down.
P

AI cheating surge pushes schools into chaos
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
25

45 Stimmen

25 Beiträge

94 Aufrufe

C

Sorry for the late reply, I had to sit and think on this one for a little bit. I think there are would be a few things going on when it comes to designing a course to teach critical thinking, nuances, and originality; and they each have their own requirements. For critical thinking: The main goal is to provide students with a toolbelt for solving various problems. Then instilling the habit of always asking "does this match the expected outcome? What was I expecting?". So usually courses will be setup so students learn about a tool, practice using the tool, then have a culminating assignment on using all the tools. Ideally, the problems students face at the end require multiple tools to solve. Nuance mainly naturally comes with exposure to the material from a professional - The way a mechanical engineer may describe building a desk will probably differ greatly compared to a fantasy author. You can also explain definitions and industry standards; but thats really dry. So I try to teach nuances via definitions by mixing in the weird nuances as much as possible with jokes. Then for originality; I've realized I dont actually look for an original idea; but something creative. In a classroom setting, you're usually learning new things about a subject so a student's knowledge of that space is usually very limited. Thus, an idea that they've never heard about may be original to them, but common for an industry expert. For teaching originality creativity, I usually provide time to be creative & think, and provide open ended questions as prompts to explore ideas. My courses that require originality usually have it as a part of the culminating assignment at the end where they can apply their knowledge. I'll also add in time where students can come to me with preliminary ideas and I can provide feedback on whether or not it passes the creative threshold. Not all ideas are original, but I sometimes give a bit of slack if its creative enough. The amount of course overhauling to get around AI really depends on the material being taught. For example, in programming - you teach critical thinking by always testing your code, even with parameters that don't make sense. For example: Try to add 123 + "skibbidy", and see what the program does.
F

[Opinion] Unending ransomware attacks are a symptom, not the sickness
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
4

1

44 Stimmen

4 Beiträge

24 Aufrufe

G

It varies based on local legislation, so in some places paying ransoms is banned but it's by no means universal. It's totally valid to be against paying ransoms wherever possible, but it's not entirely black and white in some situations. For example, what if a hospital gets ransomed? Say they serve an area not served by other facilities, and if they can't get back online quickly people will die? Sounds dramatic, but critical public services get ransomed all the time and there are undeniable real world consequences. Recovery from ransomware can cost significantly more than a ransom payment if you're not prepared. It can also take months to years to recover, especially if you're simultaneously fighting to evict a persistent (annoyed, unpaid) threat actor from your environment. For the record I don't think ransoms should be paid in most scenarios, but I do think there is some nuance to consider here.