linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI agents wrong ~70% of time: Carnegie Mellon study

Technology

152 Beiträge 76 Kommentatoren 3 Aufrufe

C cavemanfreak@programming.dev

I'd say Jan Botanist is also up there as being a pretty great person.
S This user is from outside of this forum
S This user is from outside of this forum
sheogorath@lemmy.world

schrieb zuletzt editiert von

#94

Jan Refiner is up there for me.
1 Antwort Letzte Antwort

1
K kameecoding@lemmy.world

For me as a software developer the accuracy is more in the 95%+ range.

On one hand the built in copilot chat widget in Intellij basically replaces a lot my google queries.

On the other hand it is rather fucking good at executing some rewrites that is a fucking chore to do manually, but can easily be done by copilot.

Imagine you have a script that initializes your DB with some test data. You have an Insert into statement with lots of columns and rows so

Inser into (column1,....,column n)
Values row1,
Row 2
Row n

Addig a new column with test data for each row is a PITA, but copilot handles it without issue.

Similarly when writing unit tests you do a lot of edge case testing which is a bunch of almost same looking tests with maybe one variable changing, at most you write one of those tests, then copilot will auto generate the rest after you name the next unit test, pretty good at guessing what you want to do in that test, at least with my naming scheme.

So yeah, it's way overrated for many-many things, but for programming it's a pretty awesome productivity tool.
N This user is from outside of this forum
N This user is from outside of this forum
nalivai@discuss.tchncs.de

schrieb zuletzt editiert von

#95

Keep doing what you do. Your company will pay me handsomely to throw out all your bullshit and write working code you can trust when you're done. If your company wants to have a product in the future that is.
K 1 Antwort Letzte Antwort

3
S shayeta@feddit.org

It doesn't matter if you need a human to review. AI has no way distinguishing between success and failure. Either way a human will have to review 100% of those tasks.
O This user is from outside of this forum
O This user is from outside of this forum
outbound7404@lemmy.ml

schrieb zuletzt editiert von

#96

A human can review something close to correct a lot better than starting the task from zero.
D M 2 Antworten Letzte Antwort

2
N nalivai@discuss.tchncs.de

Keep doing what you do. Your company will pay me handsomely to throw out all your bullshit and write working code you can trust when you're done. If your company wants to have a product in the future that is.
K This user is from outside of this forum
K This user is from outside of this forum
kameecoding@lemmy.world

schrieb zuletzt editiert von kameecoding@lemmy.world

#97

Lmao, okay buddy, based on how many interviews I have sat on in, the chances that you are a worse programmer than me are much higher than you being better than me.

Being a pompous ass dismissive of new tooling makes you chances even worse
P N 2 Antworten Letzte Antwort

2
M melvin_ferd@lemmy.world

Ok what about tech journalists who produced articles with those misunderstandings. Surely they know better yet still produce articles like this. But also people who care enough about this topic to post these articles usually I assume know better yet still spread this crap
S This user is from outside of this forum
S This user is from outside of this forum
suburban_hillbilly@lemmy.ml

schrieb zuletzt editiert von

#98

Gell-Mann amnesia effect - Wikipedia

(en.m.wikipedia.org)
T M 2 Antworten Letzte Antwort

5
K kameecoding@lemmy.world

For me as a software developer the accuracy is more in the 95%+ range.

On one hand the built in copilot chat widget in Intellij basically replaces a lot my google queries.

On the other hand it is rather fucking good at executing some rewrites that is a fucking chore to do manually, but can easily be done by copilot.

Imagine you have a script that initializes your DB with some test data. You have an Insert into statement with lots of columns and rows so

Inser into (column1,....,column n)
Values row1,
Row 2
Row n

Addig a new column with test data for each row is a PITA, but copilot handles it without issue.

Similarly when writing unit tests you do a lot of edge case testing which is a bunch of almost same looking tests with maybe one variable changing, at most you write one of those tests, then copilot will auto generate the rest after you name the next unit test, pretty good at guessing what you want to do in that test, at least with my naming scheme.

So yeah, it's way overrated for many-many things, but for programming it's a pretty awesome productivity tool.
D This user is from outside of this forum
D This user is from outside of this forum
dahgangalang@infosec.pub

schrieb zuletzt editiert von

#99

Yeah, it (in my case, ChatGPT) has been great for helping me along with functions I'm only passingly familiar with / trying to use in new ways.

One that I was really surprised with was that it gave me a surprisingly robust, sensible, and (seemingly) well tuned-to-my-case check list of things to inspect for a used car I intend to buy. I'm already mostly familiar with what I'm doing there, but it pointed to some things I might've overlooked / didn't know were points of concern for the specific vehicle I'm looking at.
Z 1 Antwort Letzte Antwort

1
E eli001@lemmy.world

This post did not contain any content.
A This user is from outside of this forum
A This user is from outside of this forum
apeno1@lemmy.world

schrieb zuletzt editiert von

#100

They've done studies, you know. 30% of the time, it works every time.
M 1 Antwort Letzte Antwort

7
E eli001@lemmy.world

This post did not contain any content.
B This user is from outside of this forum
B This user is from outside of this forum
burgerpocalyse@lemmy.world

schrieb zuletzt editiert von

#101

I dont know why but I am reminded of this clip about eggless omelette https://youtu.be/9Ah4tW-k8Ao
1 Antwort Letzte Antwort

2
O outbound7404@lemmy.ml

A human can review something close to correct a lot better than starting the task from zero.
D This user is from outside of this forum
D This user is from outside of this forum
dreamlandlividity@lemmy.world

schrieb zuletzt editiert von

#102

It is a lot harder to notice incorrect information in review, than making sure it is correct when writing it.
L M 2 Antworten Letzte Antwort

4
K kameecoding@lemmy.world

Lmao, okay buddy, based on how many interviews I have sat on in, the chances that you are a worse programmer than me are much higher than you being better than me.

Being a pompous ass dismissive of new tooling makes you chances even worse
P This user is from outside of this forum
P This user is from outside of this forum
potentialproblem@sh.itjust.works

schrieb zuletzt editiert von

#103

I’ve been in the industry awhile and your assessment is dead on.

As long as you’re not blindly committing the code, it’s a huge time saver for a number of mundane tasks.

It’s especially fantastic for writing throwaway tooling. Need data massaged a specific way? Ez pz. Need a script to execute an api call on each entry in a spreadsheet? No problem.

The guy above you is a nutter. Not sure if people haven’t tried leveraging LLMs or what. It has a ton of faults, but it really does speed up the mundane work. Also, clearly the person is either brand new to the field or doesn’t even work in it. Otherwise they would have seen the barely functional shite that actual humans churn out.

Part of me wonders if code organization is going to start optimizing for interpretation by these models rather than humans.
Z 1 Antwort Letzte Antwort

1
K kameecoding@lemmy.world

Lmao, okay buddy, based on how many interviews I have sat on in, the chances that you are a worse programmer than me are much higher than you being better than me.

Being a pompous ass dismissive of new tooling makes you chances even worse
N This user is from outside of this forum
N This user is from outside of this forum
nalivai@discuss.tchncs.de

schrieb zuletzt editiert von

#104

The person who uses fancy autocomplete to write their code will be exactly the person who thinks they're better than everyone. Those traits are correlated.
K 1 Antwort Letzte Antwort

2
D dahgangalang@infosec.pub

Yeah, it (in my case, ChatGPT) has been great for helping me along with functions I'm only passingly familiar with / trying to use in new ways.

One that I was really surprised with was that it gave me a surprisingly robust, sensible, and (seemingly) well tuned-to-my-case check list of things to inspect for a used car I intend to buy. I'm already mostly familiar with what I'm doing there, but it pointed to some things I might've overlooked / didn't know were points of concern for the specific vehicle I'm looking at.
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von

#105

Pepper Ridge Farms remembers when you could just do a web search and get it answered in the first couple results. Then the SEO wars happened....
1 Antwort Letzte Antwort

1
P potentialproblem@sh.itjust.works

I’ve been in the industry awhile and your assessment is dead on.

As long as you’re not blindly committing the code, it’s a huge time saver for a number of mundane tasks.

It’s especially fantastic for writing throwaway tooling. Need data massaged a specific way? Ez pz. Need a script to execute an api call on each entry in a spreadsheet? No problem.

The guy above you is a nutter. Not sure if people haven’t tried leveraging LLMs or what. It has a ton of faults, but it really does speed up the mundane work. Also, clearly the person is either brand new to the field or doesn’t even work in it. Otherwise they would have seen the barely functional shite that actual humans churn out.

Part of me wonders if code organization is going to start optimizing for interpretation by these models rather than humans.
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von

#106

When LLMs get it right it's because they're summarizing a stack overflow or GitHub snippet it was trained on. But you loose all the benefits of other humans commenting on the context, pitfalls and other alternatives.
H P 2 Antworten Letzte Antwort

1
J jsomae@lemmy.ml

yes, that's generally useless. It should not be shoved down people's throats. 30% accuracy still has its uses, especially if the result can be programmatically verified.
K This user is from outside of this forum
K This user is from outside of this forum
knock_knock_lemmy_in@lemmy.world

schrieb zuletzt editiert von

#107

Run something with a 70% failure rate 10x and you get to a cumulative 98% pass rate.
LLMs don't get tired and they can be run in parallel.
M 1 Antwort Letzte Antwort

1
T tankovayadiviziya@lemmy.world

At least AI won't fire you.
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von

#108

DOGE has entered the chat
1 Antwort Letzte Antwort

3
A affidavit@lemmy.world

"...for multi-step tasks"
L This user is from outside of this forum
L This user is from outside of this forum
loonsun@sh.itjust.works

schrieb zuletzt editiert von

#109

It's about Agents, which implies multi step as those are meant to execute a series of tasks opposed to studies looking at base LLM model performance.
1 Antwort Letzte Antwort

3
A apeno1@lemmy.world

They've done studies, you know. 30% of the time, it works every time.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#110

I ask AI to write simple little programs. One time in three they actually compile without errors. To the credit of the AI, I can feed it the error and about half the time it will fix it. Then, when it compiles and runs without crashing, about one time in three it will actually do what I wanted. To the credit of AI, I can give it revised instructions and about half the time it can fix the program to work as intended.

So, yeah, a lot like interns.
1 Antwort Letzte Antwort

5
S shayeta@feddit.org

How do I set up event driven document ingestion from OneDrive located on an Azure tenant to Amazon DocumentDB? Ingestion must be near-realtime, durable, and have some form of DLQ.
S This user is from outside of this forum
S This user is from outside of this forum
strobelt@lemmy.world

schrieb zuletzt editiert von

#111

I think you could read onedrive's notifications for new files, parse them, and pipe them to document DB via some microservice or lamba depending on the scale of your solution.
1 Antwort Letzte Antwort

0
J jsomae@lemmy.ml

I'd just like to point out that, from the perspective of somebody watching AI develop for the past 10 years, completing 30% of automated tasks successfully is pretty good! Ten years ago they could not do this at all. Overlooking all the other issues with AI, I think we are all irritated with the AI hype people for saying things like they can be right 100% of the time -- Amazon's new CEO actually said they would be able to achieve 100% accuracy this year, lmao. But being able to do 30% of tasks successfully is already useful.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#112

being able to do 30% of tasks successfully is already useful.

If you have a good testing program, it can be.

If you use AI to write the test cases...? I wouldn't fly on that airplane.
1 Antwort Letzte Antwort

3
E eli001@lemmy.world

This post did not contain any content.
T This user is from outside of this forum
T This user is from outside of this forum
timeworntraveler@lemmy.dbzer0.com

schrieb zuletzt editiert von timeworntraveler@lemmy.dbzer0.com

#113

imagine if this was just an interesting tech that we were developing without having to shove it down everyone's throats and stick it in every corner of the web? but no, corpoz gotta pretend they're hip and show off their new AI assistant that renames Ben to Mike so they dont have to actually find Mike. capitalism ruins everything.
M 1 Antwort Letzte Antwort

16

Anmelden zum Antworten

J

“I made a digital legacy prompt vault — and it might outlive me.”
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

2

1 Stimmen

1 Beiträge

0 Aufrufe

Niemand hat geantwortet
A

Oculus founder Palmer Luckey leads group of tech billionaires launching new crypto-bank — aims to fill the void left by Silicon Valley Bank's 2023 collapse
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
26

1

84 Stimmen

26 Beiträge

26 Aufrufe

K

So jail them on funding those ventures. Thought crimes are a bad thing, no matter who you direct them at.
P

Inside the face scanning tech behind social media age limits
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

25 Stimmen

1 Beiträge

8 Aufrufe

Niemand hat geantwortet
B

Sitting up and waiting.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

5 Stimmen

7 Beiträge

33 Aufrufe

A

What new AI slop hell is this?
P

Russia prepares to get rid of WhatsApp and possibly Telegram: Parliament passed a law pertaining to a national messaging app
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
16

1

92 Stimmen

16 Beiträge

6 Aufrufe

W

Telegram isn't banned in Ukraine. Can't be that bad.
P

WhatsApp is getting ads using personal data from Instagram and Facebook
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1

51 Stimmen

2 Beiträge

16 Aufrufe

B

So glad I never got on WhatsApp
P

UK Office of Communications (Ofcom) launches nine Online Safety Act investigations, including into 4chan over alleged illegal content and into seven file-sharing services over possible CSAM
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
4

1

23 Stimmen

4 Beiträge

4 Aufrufe

D

Whew..... None of the important file hosters ..
P

AI cheating surge pushes schools into chaos
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
25

45 Stimmen

25 Beiträge

92 Aufrufe

C

Sorry for the late reply, I had to sit and think on this one for a little bit. I think there are would be a few things going on when it comes to designing a course to teach critical thinking, nuances, and originality; and they each have their own requirements. For critical thinking: The main goal is to provide students with a toolbelt for solving various problems. Then instilling the habit of always asking "does this match the expected outcome? What was I expecting?". So usually courses will be setup so students learn about a tool, practice using the tool, then have a culminating assignment on using all the tools. Ideally, the problems students face at the end require multiple tools to solve. Nuance mainly naturally comes with exposure to the material from a professional - The way a mechanical engineer may describe building a desk will probably differ greatly compared to a fantasy author. You can also explain definitions and industry standards; but thats really dry. So I try to teach nuances via definitions by mixing in the weird nuances as much as possible with jokes. Then for originality; I've realized I dont actually look for an original idea; but something creative. In a classroom setting, you're usually learning new things about a subject so a student's knowledge of that space is usually very limited. Thus, an idea that they've never heard about may be original to them, but common for an industry expert. For teaching originality creativity, I usually provide time to be creative & think, and provide open ended questions as prompts to explore ideas. My courses that require originality usually have it as a part of the culminating assignment at the end where they can apply their knowledge. I'll also add in time where students can come to me with preliminary ideas and I can provide feedback on whether or not it passes the creative threshold. Not all ideas are original, but I sometimes give a bit of slack if its creative enough. The amount of course overhauling to get around AI really depends on the material being taught. For example, in programming - you teach critical thinking by always testing your code, even with parameters that don't make sense. For example: Try to add 123 + "skibbidy", and see what the program does.