linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI agents wrong ~70% of time: Carnegie Mellon study

Technology

154 Beiträge 76 Kommentatoren 3 Aufrufe

E eli001@lemmy.world

This post did not contain any content.
B This user is from outside of this forum
B This user is from outside of this forum
burgerpocalyse@lemmy.world

schrieb zuletzt editiert von

#101

I dont know why but I am reminded of this clip about eggless omelette https://youtu.be/9Ah4tW-k8Ao
1 Antwort Letzte Antwort

2
O outbound7404@lemmy.ml

A human can review something close to correct a lot better than starting the task from zero.
D This user is from outside of this forum
D This user is from outside of this forum
dreamlandlividity@lemmy.world

schrieb zuletzt editiert von

#102

It is a lot harder to notice incorrect information in review, than making sure it is correct when writing it.
L M 2 Antworten Letzte Antwort

4
K kameecoding@lemmy.world

Lmao, okay buddy, based on how many interviews I have sat on in, the chances that you are a worse programmer than me are much higher than you being better than me.

Being a pompous ass dismissive of new tooling makes you chances even worse
P This user is from outside of this forum
P This user is from outside of this forum
potentialproblem@sh.itjust.works

schrieb zuletzt editiert von

#103

I’ve been in the industry awhile and your assessment is dead on.

As long as you’re not blindly committing the code, it’s a huge time saver for a number of mundane tasks.

It’s especially fantastic for writing throwaway tooling. Need data massaged a specific way? Ez pz. Need a script to execute an api call on each entry in a spreadsheet? No problem.

The guy above you is a nutter. Not sure if people haven’t tried leveraging LLMs or what. It has a ton of faults, but it really does speed up the mundane work. Also, clearly the person is either brand new to the field or doesn’t even work in it. Otherwise they would have seen the barely functional shite that actual humans churn out.

Part of me wonders if code organization is going to start optimizing for interpretation by these models rather than humans.
Z 1 Antwort Letzte Antwort

1
K kameecoding@lemmy.world

Lmao, okay buddy, based on how many interviews I have sat on in, the chances that you are a worse programmer than me are much higher than you being better than me.

Being a pompous ass dismissive of new tooling makes you chances even worse
N This user is from outside of this forum
N This user is from outside of this forum
nalivai@discuss.tchncs.de

schrieb zuletzt editiert von

#104

The person who uses fancy autocomplete to write their code will be exactly the person who thinks they're better than everyone. Those traits are correlated.
K 1 Antwort Letzte Antwort

2
D dahgangalang@infosec.pub

Yeah, it (in my case, ChatGPT) has been great for helping me along with functions I'm only passingly familiar with / trying to use in new ways.

One that I was really surprised with was that it gave me a surprisingly robust, sensible, and (seemingly) well tuned-to-my-case check list of things to inspect for a used car I intend to buy. I'm already mostly familiar with what I'm doing there, but it pointed to some things I might've overlooked / didn't know were points of concern for the specific vehicle I'm looking at.
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von

#105

Pepper Ridge Farms remembers when you could just do a web search and get it answered in the first couple results. Then the SEO wars happened....
1 Antwort Letzte Antwort

1
P potentialproblem@sh.itjust.works

I’ve been in the industry awhile and your assessment is dead on.

As long as you’re not blindly committing the code, it’s a huge time saver for a number of mundane tasks.

It’s especially fantastic for writing throwaway tooling. Need data massaged a specific way? Ez pz. Need a script to execute an api call on each entry in a spreadsheet? No problem.

The guy above you is a nutter. Not sure if people haven’t tried leveraging LLMs or what. It has a ton of faults, but it really does speed up the mundane work. Also, clearly the person is either brand new to the field or doesn’t even work in it. Otherwise they would have seen the barely functional shite that actual humans churn out.

Part of me wonders if code organization is going to start optimizing for interpretation by these models rather than humans.
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von

#106

When LLMs get it right it's because they're summarizing a stack overflow or GitHub snippet it was trained on. But you loose all the benefits of other humans commenting on the context, pitfalls and other alternatives.
H P 2 Antworten Letzte Antwort

1
J jsomae@lemmy.ml

yes, that's generally useless. It should not be shoved down people's throats. 30% accuracy still has its uses, especially if the result can be programmatically verified.
K This user is from outside of this forum
K This user is from outside of this forum
knock_knock_lemmy_in@lemmy.world

schrieb zuletzt editiert von

#107

Run something with a 70% failure rate 10x and you get to a cumulative 98% pass rate.
LLMs don't get tired and they can be run in parallel.
M 1 Antwort Letzte Antwort

1
T tankovayadiviziya@lemmy.world

At least AI won't fire you.
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von

#108

DOGE has entered the chat
1 Antwort Letzte Antwort

3
A affidavit@lemmy.world

"...for multi-step tasks"
L This user is from outside of this forum
L This user is from outside of this forum
loonsun@sh.itjust.works

schrieb zuletzt editiert von

#109

It's about Agents, which implies multi step as those are meant to execute a series of tasks opposed to studies looking at base LLM model performance.
1 Antwort Letzte Antwort

3
A apeno1@lemmy.world

They've done studies, you know. 30% of the time, it works every time.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#110

I ask AI to write simple little programs. One time in three they actually compile without errors. To the credit of the AI, I can feed it the error and about half the time it will fix it. Then, when it compiles and runs without crashing, about one time in three it will actually do what I wanted. To the credit of AI, I can give it revised instructions and about half the time it can fix the program to work as intended.

So, yeah, a lot like interns.
1 Antwort Letzte Antwort

5
S shayeta@feddit.org

How do I set up event driven document ingestion from OneDrive located on an Azure tenant to Amazon DocumentDB? Ingestion must be near-realtime, durable, and have some form of DLQ.
S This user is from outside of this forum
S This user is from outside of this forum
strobelt@lemmy.world

schrieb zuletzt editiert von

#111

I think you could read onedrive's notifications for new files, parse them, and pipe them to document DB via some microservice or lamba depending on the scale of your solution.
1 Antwort Letzte Antwort

0
J jsomae@lemmy.ml

I'd just like to point out that, from the perspective of somebody watching AI develop for the past 10 years, completing 30% of automated tasks successfully is pretty good! Ten years ago they could not do this at all. Overlooking all the other issues with AI, I think we are all irritated with the AI hype people for saying things like they can be right 100% of the time -- Amazon's new CEO actually said they would be able to achieve 100% accuracy this year, lmao. But being able to do 30% of tasks successfully is already useful.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#112

being able to do 30% of tasks successfully is already useful.

If you have a good testing program, it can be.

If you use AI to write the test cases...? I wouldn't fly on that airplane.
1 Antwort Letzte Antwort

3
E eli001@lemmy.world

This post did not contain any content.
T This user is from outside of this forum
T This user is from outside of this forum
timeworntraveler@lemmy.dbzer0.com

schrieb zuletzt editiert von timeworntraveler@lemmy.dbzer0.com

#113

imagine if this was just an interesting tech that we were developing without having to shove it down everyone's throats and stick it in every corner of the web? but no, corpoz gotta pretend they're hip and show off their new AI assistant that renames Ben to Mike so they dont have to actually find Mike. capitalism ruins everything.
M 1 Antwort Letzte Antwort

16
S shayeta@feddit.org

It doesn't matter if you need a human to review. AI has no way distinguishing between success and failure. Either way a human will have to review 100% of those tasks.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#114

I have been using AI to write (little, near trivial) programs. It's blindingly obvious that it could be feeding this code to a compiler and catching its mistakes before giving them to me, but it doesn't... yet.
1 Antwort Letzte Antwort

3
O outbound7404@lemmy.ml

A human can review something close to correct a lot better than starting the task from zero.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#115

In University I knew a lot of students who knew all the things but "just don't know where to start" - if I gave them a little direction about where to start, they could run it to the finish all on their own.
1 Antwort Letzte Antwort

1
S suburban_hillbilly@lemmy.ml

Gell-Mann amnesia effect - Wikipedia

(en.m.wikipedia.org)
T This user is from outside of this forum
T This user is from outside of this forum
timeworntraveler@lemmy.dbzer0.com

schrieb zuletzt editiert von

#116

AI cant even understand it's own brain to write about it
T 1 Antwort Letzte Antwort

0
D dreamlandlividity@lemmy.world

It is a lot harder to notice incorrect information in review, than making sure it is correct when writing it.
L This user is from outside of this forum
L This user is from outside of this forum
loonsun@sh.itjust.works

schrieb zuletzt editiert von

#117

Depends on the context, there is a lot of work in the scientific methods community trying to use NLP to augment traditionally fully human processes such as thematic analysis and systematic literature reviews and you can have protocols for validation there without 100% human review
1 Antwort Letzte Antwort

1
M melvin_ferd@lemmy.world

Are you guys sure. The media seems to be where a lot of LLM hate originates.
T This user is from outside of this forum
T This user is from outside of this forum
timeworntraveler@lemmy.dbzer0.com

schrieb zuletzt editiert von

#118

that is such a ridiculous idea. Just because you see hate for it in the media doesn't mean it originated there. I'll have you know that i have embarrassed myself by screaming at robot phone receptionists for years now. stupid fuckers pretending to be people but not knowing shit. I was born ready to hate LLMs and I'm not gonna have you claim that CNN made me do it.
M 1 Antwort Letzte Antwort

0
D dreamlandlividity@lemmy.world

It is a lot harder to notice incorrect information in review, than making sure it is correct when writing it.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#119

harder to notice incorrect information in review, than making sure it is correct when writing it.

That depends entirely on your writing method and attention span for review.

Most people make stuff up off the cuff and skim anything longer than 75 words when reviewing, so the bar for AI improving over that is really low.
1 Antwort Letzte Antwort

1
L lepinkainen@lemmy.world

Wrong 70% doing what?

I’ve used LLMs as a Stack Overflow / MSDN replacement for over a year and if they fucked up 7/10 questions I’d stop.

Same with code, any free model can easily generate simple scripts and utilities with maybe 10% error rate, definitely not 70%
T This user is from outside of this forum
T This user is from outside of this forum
timeworntraveler@lemmy.dbzer0.com

schrieb zuletzt editiert von

#120

it specifies the tasks in the article
1 Antwort Letzte Antwort

0

Anmelden zum Antworten

A

Buy Rolling Paper Accessories Vancouver – Premium Add-Ons for the Perfect Roll
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

9 Aufrufe

Niemand hat geantwortet
S

Amazon boss tells staff AI means their jobs are at risk in coming years
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
71

1

300 Stimmen

71 Beiträge

192 Aufrufe

T

Time to head for greener pastures.
P

Gold Phone and Mobile Service Blasted as Latest 'Trump Family Profiteering'
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
40

1

325 Stimmen

40 Beiträge

143 Aufrufe

P

Jimmy Carter gave up his tiny peanut farm. Yet people nowadays are just incapable of understanding the concept of conflict of interest?
P

WhatsApp is getting ads using personal data from Instagram and Facebook
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1

51 Stimmen

2 Beiträge

16 Aufrufe

B

So glad I never got on WhatsApp
T

The Case for Software Craftsmanship in the Era of Vibes — Zed's Blog
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
11

1

61 Stimmen

11 Beiträge

41 Aufrufe

K

If you use LLMs like they should be, i.e. as autocomplete, they're helpful. Classic autocomplete can't see me type "import" and correctly guess that I want to import a file that I just created, but Copilot can. You shouldn't expect it to understand code, but it can type more quickly than you and plug the right things in more often than not.
P

The AI-powered collapse of the American tech workfoce
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1

4 Stimmen

2 Beiträge

19 Aufrufe

R

The biggest tech companies are still trimming from pandemic over hiring. Smaller companies are still snatching workers up. And you also have companies trimming payroll for the coming Trump recession. Neither have anything to do with AI.
A

HMD, Lava to launch feature phones with direct-to-mobile technology, Developed in collaboration with Tejas Networks and powered by Saankhya's chipset, these phones can stream content without internet
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
6

9 Stimmen

6 Beiträge

10 Aufrufe

N

So they.just reinvented the DVB-T tuner. Edit: I looked it up and it's literally just that. The fact they're shoving it into feature phones is interesting.
F

Indian Government orders censoring of accounts on X
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
12

149 Stimmen

12 Beiträge

40 Aufrufe

M

Why? Because you can’t sell them?