linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI agents wrong ~70% of time: Carnegie Mellon study

Technology

154 Beiträge 76 Kommentatoren 3 Aufrufe

M melvin_ferd@lemmy.world

Ok what about tech journalists who produced articles with those misunderstandings. Surely they know better yet still produce articles like this. But also people who care enough about this topic to post these articles usually I assume know better yet still spread this crap
J This user is from outside of this forum
J This user is from outside of this forum
jordanz@lemmy.world

schrieb zuletzt editiert von

#83

I liked when the Chicago Sun-Times put out a summer reading list and only a third of the books on it were real. Each book had a summary of the plot next to it too. They later apologized for it.
1 Antwort Letzte Antwort

5
H hertzdentalbar@lemmy.blahaj.zone

So no different than answers from middle management I guess?
S This user is from outside of this forum
S This user is from outside of this forum
suburban_hillbilly@lemmy.ml

schrieb zuletzt editiert von

#84

This basically the entirety of the hype from the group of people claiming LLMs are going take over the work force. Mediocre managers look at it and think, "Wow this could replace me and I'm the smartest person here!"

Sure, Jan.
S 1 Antwort Letzte Antwort

21
J jsomae@lemmy.ml

I think everyone in the universe is aware of how LLMs work by now, you don't need to explain it to someone just because they think LLMs are more useful than you do.

IDK what you mean by glazing but if by "glaze" you mean "understanding the potential threat of AI to society instead of hiding under a rock and pretending it's as useless as a plastic radio," then no, I won't stop.
O This user is from outside of this forum
O This user is from outside of this forum
outhouseperilous@lemmy.dbzer0.com

schrieb zuletzt editiert von

#85

It's absolutely dangerous but it doesnt have to work even a little to do damage; hell, it already has. Your thing just makes it sound much more capable than it is. And it is not.

Also, it's not AI.
J 1 Antwort Letzte Antwort

2
E eli001@lemmy.world

This post did not contain any content.
A This user is from outside of this forum
A This user is from outside of this forum
affidavit@lemmy.world

schrieb zuletzt editiert von

#86

"...for multi-step tasks"
L 1 Antwort Letzte Antwort

11
N narrativebear@lemmy.world

The ones being implemented into emergency call centers are better though? Right?
T This user is from outside of this forum
T This user is from outside of this forum
tollana1234567@lemmy.today

schrieb zuletzt editiert von

#87

i wonder how the evil palintir uses its AI.
1 Antwort Letzte Antwort

0
S suburban_hillbilly@lemmy.ml

This basically the entirety of the hype from the group of people claiming LLMs are going take over the work force. Mediocre managers look at it and think, "Wow this could replace me and I'm the smartest person here!"

Sure, Jan.
S This user is from outside of this forum
S This user is from outside of this forum
sheogorath@lemmy.world

schrieb zuletzt editiert von

#88

I won't tolerate Jan slander here. I know he's just a builder, but his life path has the most probability of having a great person out of it!
C 1 Antwort Letzte Antwort

3
E eli001@lemmy.world

This post did not contain any content.
K This user is from outside of this forum
K This user is from outside of this forum
kameecoding@lemmy.world

schrieb zuletzt editiert von

#89

For me as a software developer the accuracy is more in the 95%+ range.

On one hand the built in copilot chat widget in Intellij basically replaces a lot my google queries.

On the other hand it is rather fucking good at executing some rewrites that is a fucking chore to do manually, but can easily be done by copilot.

Imagine you have a script that initializes your DB with some test data. You have an Insert into statement with lots of columns and rows so

Inser into (column1,....,column n)
Values row1,
Row 2
Row n

Addig a new column with test data for each row is a PITA, but copilot handles it without issue.

Similarly when writing unit tests you do a lot of edge case testing which is a bunch of almost same looking tests with maybe one variable changing, at most you write one of those tests, then copilot will auto generate the rest after you name the next unit test, pretty good at guessing what you want to do in that test, at least with my naming scheme.

So yeah, it's way overrated for many-many things, but for programming it's a pretty awesome productivity tool.
N D 2 Antworten Letzte Antwort

8
H hertzdentalbar@lemmy.blahaj.zone

Did you make it? Or did you prompt it? They ain't quite the same.
F This user is from outside of this forum
F This user is from outside of this forum
fossilesque@mander.xyz

schrieb zuletzt editiert von fossilesque@mander.xyz

#90

It calls ollama with a prompt, it's a bit complex because it renames and moves stuff too and sorts it.
1 Antwort Letzte Antwort

0
O outhouseperilous@lemmy.dbzer0.com

It's absolutely dangerous but it doesnt have to work even a little to do damage; hell, it already has. Your thing just makes it sound much more capable than it is. And it is not.

Also, it's not AI.
J This user is from outside of this forum
J This user is from outside of this forum
jsomae@lemmy.ml

schrieb zuletzt editiert von

#91

semantics.
O 1 Antwort Letzte Antwort

3
J jsomae@lemmy.ml

semantics.
O This user is from outside of this forum
O This user is from outside of this forum
outhouseperilous@lemmy.dbzer0.com

schrieb zuletzt editiert von

#92

No, it matters. Youre pushing the lie they want pushed.
H 1 Antwort Letzte Antwort

1
S sheogorath@lemmy.world

I won't tolerate Jan slander here. I know he's just a builder, but his life path has the most probability of having a great person out of it!
C This user is from outside of this forum
C This user is from outside of this forum
cavemanfreak@programming.dev

schrieb zuletzt editiert von

#93

I'd say Jan Botanist is also up there as being a pretty great person.
S 1 Antwort Letzte Antwort

2
C cavemanfreak@programming.dev

I'd say Jan Botanist is also up there as being a pretty great person.
S This user is from outside of this forum
S This user is from outside of this forum
sheogorath@lemmy.world

schrieb zuletzt editiert von

#94

Jan Refiner is up there for me.
1 Antwort Letzte Antwort

1
K kameecoding@lemmy.world

For me as a software developer the accuracy is more in the 95%+ range.

On one hand the built in copilot chat widget in Intellij basically replaces a lot my google queries.

On the other hand it is rather fucking good at executing some rewrites that is a fucking chore to do manually, but can easily be done by copilot.

Imagine you have a script that initializes your DB with some test data. You have an Insert into statement with lots of columns and rows so

Inser into (column1,....,column n)
Values row1,
Row 2
Row n

Addig a new column with test data for each row is a PITA, but copilot handles it without issue.

Similarly when writing unit tests you do a lot of edge case testing which is a bunch of almost same looking tests with maybe one variable changing, at most you write one of those tests, then copilot will auto generate the rest after you name the next unit test, pretty good at guessing what you want to do in that test, at least with my naming scheme.

So yeah, it's way overrated for many-many things, but for programming it's a pretty awesome productivity tool.
N This user is from outside of this forum
N This user is from outside of this forum
nalivai@discuss.tchncs.de

schrieb zuletzt editiert von

#95

Keep doing what you do. Your company will pay me handsomely to throw out all your bullshit and write working code you can trust when you're done. If your company wants to have a product in the future that is.
K 1 Antwort Letzte Antwort

3
S shayeta@feddit.org

It doesn't matter if you need a human to review. AI has no way distinguishing between success and failure. Either way a human will have to review 100% of those tasks.
O This user is from outside of this forum
O This user is from outside of this forum
outbound7404@lemmy.ml

schrieb zuletzt editiert von

#96

A human can review something close to correct a lot better than starting the task from zero.
D M 2 Antworten Letzte Antwort

2
N nalivai@discuss.tchncs.de

Keep doing what you do. Your company will pay me handsomely to throw out all your bullshit and write working code you can trust when you're done. If your company wants to have a product in the future that is.
K This user is from outside of this forum
K This user is from outside of this forum
kameecoding@lemmy.world

schrieb zuletzt editiert von kameecoding@lemmy.world

#97

Lmao, okay buddy, based on how many interviews I have sat on in, the chances that you are a worse programmer than me are much higher than you being better than me.

Being a pompous ass dismissive of new tooling makes you chances even worse
P N 2 Antworten Letzte Antwort

2
M melvin_ferd@lemmy.world

Ok what about tech journalists who produced articles with those misunderstandings. Surely they know better yet still produce articles like this. But also people who care enough about this topic to post these articles usually I assume know better yet still spread this crap
S This user is from outside of this forum
S This user is from outside of this forum
suburban_hillbilly@lemmy.ml

schrieb zuletzt editiert von

#98

Gell-Mann amnesia effect - Wikipedia

(en.m.wikipedia.org)
T M 2 Antworten Letzte Antwort

5
K kameecoding@lemmy.world

For me as a software developer the accuracy is more in the 95%+ range.

On one hand the built in copilot chat widget in Intellij basically replaces a lot my google queries.

On the other hand it is rather fucking good at executing some rewrites that is a fucking chore to do manually, but can easily be done by copilot.

Imagine you have a script that initializes your DB with some test data. You have an Insert into statement with lots of columns and rows so

Inser into (column1,....,column n)
Values row1,
Row 2
Row n

Addig a new column with test data for each row is a PITA, but copilot handles it without issue.

Similarly when writing unit tests you do a lot of edge case testing which is a bunch of almost same looking tests with maybe one variable changing, at most you write one of those tests, then copilot will auto generate the rest after you name the next unit test, pretty good at guessing what you want to do in that test, at least with my naming scheme.

So yeah, it's way overrated for many-many things, but for programming it's a pretty awesome productivity tool.
D This user is from outside of this forum
D This user is from outside of this forum
dahgangalang@infosec.pub

schrieb zuletzt editiert von

#99

Yeah, it (in my case, ChatGPT) has been great for helping me along with functions I'm only passingly familiar with / trying to use in new ways.

One that I was really surprised with was that it gave me a surprisingly robust, sensible, and (seemingly) well tuned-to-my-case check list of things to inspect for a used car I intend to buy. I'm already mostly familiar with what I'm doing there, but it pointed to some things I might've overlooked / didn't know were points of concern for the specific vehicle I'm looking at.
Z 1 Antwort Letzte Antwort

1
E eli001@lemmy.world

This post did not contain any content.
A This user is from outside of this forum
A This user is from outside of this forum
apeno1@lemmy.world

schrieb zuletzt editiert von

#100

They've done studies, you know. 30% of the time, it works every time.
M 1 Antwort Letzte Antwort

7
E eli001@lemmy.world

This post did not contain any content.
B This user is from outside of this forum
B This user is from outside of this forum
burgerpocalyse@lemmy.world

schrieb zuletzt editiert von

#101

I dont know why but I am reminded of this clip about eggless omelette https://youtu.be/9Ah4tW-k8Ao
1 Antwort Letzte Antwort

2
O outbound7404@lemmy.ml

A human can review something close to correct a lot better than starting the task from zero.
D This user is from outside of this forum
D This user is from outside of this forum
dreamlandlividity@lemmy.world

schrieb zuletzt editiert von

#102

It is a lot harder to notice incorrect information in review, than making sure it is correct when writing it.
L M 2 Antworten Letzte Antwort

4

Anmelden zum Antworten

U

A Forensic Examination of GIS Arta
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

6 Stimmen

1 Beiträge

3 Aufrufe

Niemand hat geantwortet
P

Netflix teams up with NASA to show live rocket launches and spacewalks
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
17

1

212 Stimmen

17 Beiträge

63 Aufrufe

A

When it comes to public outreach, the question is more “why not?”
P

The Department of Defense Efforts to Buy and Maintain IT Systems Are Billions Over Budget and Delayed
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
13

1

216 Stimmen

13 Beiträge

38 Aufrufe

J

It’s DEI’s fault!
P

Samsung is desperate to compete on chips. Workers say it comes at a cost.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

27 Stimmen

1 Beiträge

16 Aufrufe

Niemand hat geantwortet
P

IBM plans to build IBM Quantum Starling, a “fault-tolerant” quantum computer with 20,000x the power of today's quantum computers, in New York state by 2029
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
5

21 Stimmen

5 Beiträge

13 Aufrufe

N

Both waiting and not Both alive and not Both lying or not Both existing or not
T

Why 3D-Printing an Untraceable Ghost Gun Is Easier Than Ever (Podcast 18mins)
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
19

1

11 Stimmen

19 Beiträge

65 Aufrufe

E

No, just laminated ones. Closed at one end. Easy enough to make or buy. You can even improvise the propellant.
N

DNS Piracy Blocking Orders: Google, Cloudflare, and OpenDNS Respond Differently * TorrentFreak
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
9

66 Stimmen

9 Beiträge

39 Aufrufe

F

HE is amazing. their BGP looking glass tool is also one of my favorite troubleshooting tools for backbone issues. 10/10 ISP
T

We have reached the “severed fingers and abductions” stage of the crypto revolution - Ars Technica
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
20

1

325 Stimmen

20 Beiträge

50 Aufrufe

R

It's extremely traceable. There is a literal public ledger if every single transaction.