linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI agents wrong ~70% of time: Carnegie Mellon study

Technology

277 Beiträge 108 Kommentatoren 90 Aufrufe

M mangocats@feddit.it

I have actually been doing this lately: iteratively prompting AI to write software and fix its errors until something useful comes out. It's a lot like machine translation. I speak fluent C++, but I don't speak Rust, but I can hammer away on the AI (with English language prompts) until it produces passable Rust for something I could write for myself in C++ in half the time and effort.

I also don't speak Finnish, but Google Translate can take what I say in English and put it into at least somewhat comprehensible Finnish without egregious translation errors most of the time.

Is this useful? When C++ is getting banned for "security concerns" and Rust is the required language, it's at least a little helpful.
J This user is from outside of this forum
J This user is from outside of this forum
jsomae@lemmy.ml

schrieb zuletzt editiert von

#163

I'm impressed you can make strides with Rust with AI. I am in a similar boat, except I've found LLMs are terrible with Rust.
M 1 Antwort Letzte Antwort

0
O outhouseperilous@lemmy.dbzer0.com

No, it matters. Youre pushing the lie they want pushed.
J This user is from outside of this forum
J This user is from outside of this forum
jsomae@lemmy.ml

schrieb zuletzt editiert von

#164

Hitler liked to paint, doesn't make painting wrong. The fact that big tech is pushing AI isn't evidence against the utility of AI.

That common parlance is to call machine learning "AI" these days doesn't matter to me in the slightest. Do you have a definition of "intelligence"? Do you object when pathfinding is called AI? Or STRIPS? Or bots in a video game? Dare I say it, the main difference between those AIs and LLMs is their generality -- so why not just call it GAI at this point tbh. This is a question of semantics so it really doesn't matter to the deeper question. Doesn't matter if you call it AI or not, LLMs work the same way either way.
O 1 Antwort Letzte Antwort

0
S surph_ninja@lemmy.world

So you’re saying the article’s measurements about AI agents being wrong 70% of the time is made up? Or is AI performance only measurable when the results help anti-AI narratives?
J This user is from outside of this forum
J This user is from outside of this forum
jakeroxs@sh.itjust.works

schrieb zuletzt editiert von

#165

I would definitely bet it's made up and poorly designed.

I wish that weren't the case because having actual data would be nice, but these are almost always funded with some sort of intentional slant, for example nic vape safety where they clearly don't use the product sanely and then make wild claims about how there's lead in the vapes!

Homie you're fucking running the shit completely dry for longer then any humans could possible actually hit the vape, no shit it's producing carcinogens.

Go burn a bunch of paper and directly inhale the smoke and tell me paper is dangerous.
S 1 Antwort Letzte Antwort

2
J jakeroxs@sh.itjust.works

I would definitely bet it's made up and poorly designed.

I wish that weren't the case because having actual data would be nice, but these are almost always funded with some sort of intentional slant, for example nic vape safety where they clearly don't use the product sanely and then make wild claims about how there's lead in the vapes!

Homie you're fucking running the shit completely dry for longer then any humans could possible actually hit the vape, no shit it's producing carcinogens.

Go burn a bunch of paper and directly inhale the smoke and tell me paper is dangerous.
S This user is from outside of this forum
S This user is from outside of this forum
surph_ninja@lemmy.world

schrieb zuletzt editiert von

#166

Agreed. 70% is astoundingly high for today’s models. Something stinks.
1 Antwort Letzte Antwort

1
B blackmist@feddit.uk

We have created the overconfident intern in digital form.
J This user is from outside of this forum
J This user is from outside of this forum
jumping_redditor@sh.itjust.works

schrieb zuletzt editiert von

#167

Unfortunately marketing tries to sell it as a senior everything ologist
1 Antwort Letzte Antwort

14
T tja@programming.dev

DocumentDB is not for one drive documents (PDFs and such). It's for "documents" as in serialized objects (json or bson).
S This user is from outside of this forum
S This user is from outside of this forum
shayeta@feddit.org

schrieb zuletzt editiert von

#168

That's even better, I can just jam something in before it and churn the documents through an embedding model, thanks!
1 Antwort Letzte Antwort

1
E eli001@lemmy.world

This post did not contain any content.
S This user is from outside of this forum
S This user is from outside of this forum
socialmediarefugee@lemmy.world

schrieb zuletzt editiert von socialmediarefugee@lemmy.world

#169

I use it for very specific tasks and give as much information as possible. I usually have to give it more feedback to get to the desired goal. For instance I will ask it how to resolve an error message. I've even asked it for some short python code. I almost always get good feedback when doing that. Asking it about basic facts works too like science questions.

One thing I have had problems with is if the error is sort of an oddball it will give me suggestions that don't work with my OS/app version even though I gave it that info. Then I give it feedback and eventually it will loop back to its original suggestions, so it couldn't come up with an answer.

I've also found differences in chatgpt vs MS copilot with chatgpt usually being better results.
1 Antwort Letzte Antwort

2
F fogetaboutit@programming.dev

please bro just one hundred more GPU and one more billion dollars of research, we make it good please bro
S This user is from outside of this forum
S This user is from outside of this forum
socialmediarefugee@lemmy.world

schrieb zuletzt editiert von

#170

And let it suck up 10% or so of all of the power in the region.
A 1 Antwort Letzte Antwort

12
M mangocats@feddit.it

The first half dozen times I tried AI for code, across the past year or so, it failed pretty much as you describe.

Finally, I hit on some things it can do. For me: keeping the instructions more general, not specifying certain libraries for instance, was the key to getting something that actually does something. Also, if it doesn't show you the whole program, get it to show you the whole thing, and make it fix its own mistakes so you can build on working code with later requests.
S This user is from outside of this forum
S This user is from outside of this forum
socialmediarefugee@lemmy.world

schrieb zuletzt editiert von

#171

I've had good results being very specific, like "Generate some python 3 code for me that converts X to Y, recursively through all subdirectories, and converts the files in place."
M 1 Antwort Letzte Antwort

0
O outhouseperilous@lemmy.dbzer0.com

It's absolutely dangerous but it doesnt have to work even a little to do damage; hell, it already has. Your thing just makes it sound much more capable than it is. And it is not.

Also, it's not AI.

Edit: and in a comment replying to this one, one of your fellow fanboys proved

everyone knows how they work

Wrong
J This user is from outside of this forum
J This user is from outside of this forum
jumping_redditor@sh.itjust.works

schrieb zuletzt editiert von

#172

the industrial revolution could be seen as dangerous, yet it brought the highest standard of living increase in centuries
1 Antwort Letzte Antwort

1
S surph_ninja@lemmy.world

So you’re saying the article’s measurements about AI agents being wrong 70% of the time is made up? Or is AI performance only measurable when the results help anti-AI narratives?
C This user is from outside of this forum
C This user is from outside of this forum
chaonaut@lemmy.4d2.org

schrieb zuletzt editiert von

#173

I mean, sure, in that the expectation is that the article is talking about AI in general. The cited paper is discussing LLMs and their ability to complete tasks. So, we have to agree that LLMs are what we mean by AI, and that their ability to complete tasks is a valid metric for AI. If we accept the marketing hype, then of course LLMs are exactly what we've been talking about with AI, and we've accepted LLMs features and limitations as what AI is. If LLMs are prone to filling in with whatever closest fits the model without regard to accuracy, by accepting LLMs as what we mean by AI, then AI fits to its model without regard to accuracy.
S 1 Antwort Letzte Antwort

0
J jsomae@lemmy.ml

I'm impressed you can make strides with Rust with AI. I am in a similar boat, except I've found LLMs are terrible with Rust.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#174

I was 0/6 on various trials of AI for Rust over the past 6 months, then I caught a success. Turns out, I was asking it to use a difficult library - I can't make the thing I want work in that library either (library docs say it's possible, but...) when I posed a more open ended request without specifying the library to use, it succeeded - after a fashion. It will give you code with cargo build errors, I copy-paste the error back to it like "address: <pasted error message>" and a bit more than half of the time it is able to respond with a working fix.
J 1 Antwort Letzte Antwort

2
C chaonaut@lemmy.4d2.org

I mean, sure, in that the expectation is that the article is talking about AI in general. The cited paper is discussing LLMs and their ability to complete tasks. So, we have to agree that LLMs are what we mean by AI, and that their ability to complete tasks is a valid metric for AI. If we accept the marketing hype, then of course LLMs are exactly what we've been talking about with AI, and we've accepted LLMs features and limitations as what AI is. If LLMs are prone to filling in with whatever closest fits the model without regard to accuracy, by accepting LLMs as what we mean by AI, then AI fits to its model without regard to accuracy.
S This user is from outside of this forum
S This user is from outside of this forum
surph_ninja@lemmy.world

schrieb zuletzt editiert von

#175

Except you yourself just stated that it was impossible to measure performance of these things. When it’s favorable to AI, you claim it can’t be measured. When it’s unfavorable for AI, you claim of course it’s measurable. Your argument is so flimsy and your understanding so limited that you can’t even stick to a single idea. You’re all over the place.
C 1 Antwort Letzte Antwort

1
S socialmediarefugee@lemmy.world

I've had good results being very specific, like "Generate some python 3 code for me that converts X to Y, recursively through all subdirectories, and converts the files in place."
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#176

I have been more successful with baby steps like: "Write a python 3 program that converts X to Y." Tweak prompt until that's working as desired, then: "make it work recursively through all subdirectories" - and again tweak with specifics like converting the files in place, etc. Always very specific, also - force it to fix its own bugs so you can move forward with a clean example as you add complexity. Complexity seems to cap out at a couple of pages of code, at which point "Ooops, something went wrong."
1 Antwort Letzte Antwort

2
Z zbyte64@awful.systems

A junior developer actually learns from doing the job, an LLM only learns when they update the training corpus and develop an updated model.
J This user is from outside of this forum
J This user is from outside of this forum
jumping_redditor@sh.itjust.works

schrieb zuletzt editiert von

#177

an llm costs less, and won't compain when yelled at
Z 1 Antwort Letzte Antwort

0
H honytawk@feddit.nl

The comparison is about the correctness of their work.

Their lives have nothing to do with it.
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von

#178

Human lives are the most important thing of all. Profits are irrelevant compared to human lives. I get that that's not how Besos sees the world, but he's a monstrous outlier.
1 Antwort Letzte Antwort

0
K knock_knock_lemmy_in@lemmy.world

Run something with a 70% failure rate 10x and you get to a cumulative 98% pass rate.
LLMs don't get tired and they can be run in parallel.
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von

#179

What's 0.7^10?
K 1 Antwort Letzte Antwort

0
M melvin_ferd@lemmy.world

Search AI in Lemmy and check out every article on it. It definitely is media spreading all the hate. And like this article is often some money yellow journalism
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von

#180

I think it's lemmy users. I see a lot more LLM skepticism here than in the news feeds.

In my experience, LLMs are like the laziest, shittiest know-nothing bozo forced to complete a task with zero attention to detail and zero care about whether it's crap, just doing enough to sound convincing.
M S 2 Antworten Letzte Antwort

1
U ulrich@feddit.org

That's not really helping though. The fact that you were transferred to them in the first place instead of directly to a human was an impediment.
E This user is from outside of this forum
E This user is from outside of this forum
eatcasserole@lemmy.world

schrieb zuletzt editiert von

#181

Oh absolutely, nothing was gained, time was wasted. My wording was too charitable.
1 Antwort Letzte Antwort

0
S surph_ninja@lemmy.world

Except you yourself just stated that it was impossible to measure performance of these things. When it’s favorable to AI, you claim it can’t be measured. When it’s unfavorable for AI, you claim of course it’s measurable. Your argument is so flimsy and your understanding so limited that you can’t even stick to a single idea. You’re all over the place.
C This user is from outside of this forum
C This user is from outside of this forum
chaonaut@lemmy.4d2.org

schrieb zuletzt editiert von

#182

It questionable to measure these things as being reflective of AI, because what AI is changes based on what piece of tech is being hawked as AI, because we're really bad at defining what intelligence is and isn't. You want to claim LLMs as AI? Go ahead, but you also adopt the problems of LLMs as the problems of AIs. Defining AI and thus its metrics is a moving target. When we can't agree to what is is, we can't agree to what it can do.
S 1 Antwort Letzte Antwort

0

Anmelden zum Antworten

E

The Prototype: One Step Closer To Fusion Power
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
6

1

116 Stimmen

6 Beiträge

20 Aufrufe

D

You're forgettinf the simplest reason using Occams Razor. Nuclear energy would have bitten into Oil and Gas. Definitely helped lobby against nuclear power in western countries where companies have as much power as governments.
F

Does using ChatGPT change your brain activity? Study sparks debate
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
8

1

51 Stimmen

8 Beiträge

36 Aufrufe

B

But do you also sometimes leave out AI for steps the AI often does for you, like the conceptualisation or the implementation? Would it be possible for you to do these steps as efficiently as before the use of AI? Would you be able to spot the mistakes the AI makes in these steps, even months or years along those lines? The main issue I have with AI being used in tasks is that it deprives you from using logic by applying it to real life scenarios, the thing we excel at. It would be better to use AI in the opposite direction you are currently use it as: develop methods to view the works critically. After all, if there is one thing a lot of people are bad at, it's thorough critical thinking. We just suck at knowing of all edge cases and how we test for them. Let the AI come up with unit tests, let it be the one that questions your work, in order to get a better perspective on it.
P

Inside the face scanning tech behind social media age limits
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

25 Stimmen

1 Beiträge

8 Aufrufe

Niemand hat geantwortet
P

Russian Drones Are Attacking Ukrainian Civilians in Kherson; 93-page HRW Report Exposes Russian Military Drones Committing War Crimes Against Civilians For The Purpose of Instilling Terror.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
4

1

116 Stimmen

4 Beiträge

5 Aufrufe

D

The terror will continue until you join us, then we will be nice, I promise!
R

Google quietly paused the rollout of its AI-powered ‘Ask Photos’ search feature
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
12

1

85 Stimmen

12 Beiträge

48 Aufrufe

C

i like how ask photos is not just a dumb idea but it's also a dumb name
W

Meta’s ‘Free Expression’ Push Results in Far Fewer Content Takedowns
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1

8 Stimmen

2 Beiträge

15 Aufrufe

R

Meta? Isn't that owned by alleged pedophile Mark Zuckerberg? I heard he was a pedo on Facebook.
P

Discord unveils Discord Orbs, a new in-app currency that users can earn by completing Quests, which reward participants who interact with ads
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
137

1

154 Stimmen

137 Beiträge

25 Aufrufe

B

If you're after text, there are a number of options. If you're after group voice, there are a number of options. You could mix and match both, but "where everyone else is" will also likely be a factor in that kind of decision. If you want both together, then there's probably just Element (Matrix + voice)? Not sure of other options that aren't centralised, where you're the product, or otherwise at obvious risk of enshittifying. (And Element has the smell of the latter to me, but that's another topic). I've prepared for Discord's inevitable "final straw" moment by setting up a Matrix room and maintaining a self-hosted Mumble server in Docker for my gaming buddies. It's worked when Discord has been down, so I know it works. Yet to convince them to test Element...
A

A Novel Approach to Youtube Ads
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

1

0 Stimmen

3 Beiträge

0 Aufrufe

A

Part of the reason I am not advocating for or against the extension or the source. People can judge for themselves. I thought it was funny (not a great idea but definitely an interesting implementation). For the record I use both ublock origin and Firefox, and I also run a pihole at home. I'm just putting out there that it exists.