linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI agents wrong ~70% of time: Carnegie Mellon study

Technology

272 Beiträge 107 Kommentatoren 79 Aufrufe

J jsomae@lemmy.ml

I'm impressed you can make strides with Rust with AI. I am in a similar boat, except I've found LLMs are terrible with Rust.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#174

I was 0/6 on various trials of AI for Rust over the past 6 months, then I caught a success. Turns out, I was asking it to use a difficult library - I can't make the thing I want work in that library either (library docs say it's possible, but...) when I posed a more open ended request without specifying the library to use, it succeeded - after a fashion. It will give you code with cargo build errors, I copy-paste the error back to it like "address: <pasted error message>" and a bit more than half of the time it is able to respond with a working fix.
J 1 Antwort Letzte Antwort

2
C chaonaut@lemmy.4d2.org

I mean, sure, in that the expectation is that the article is talking about AI in general. The cited paper is discussing LLMs and their ability to complete tasks. So, we have to agree that LLMs are what we mean by AI, and that their ability to complete tasks is a valid metric for AI. If we accept the marketing hype, then of course LLMs are exactly what we've been talking about with AI, and we've accepted LLMs features and limitations as what AI is. If LLMs are prone to filling in with whatever closest fits the model without regard to accuracy, by accepting LLMs as what we mean by AI, then AI fits to its model without regard to accuracy.
S This user is from outside of this forum
S This user is from outside of this forum
surph_ninja@lemmy.world

schrieb zuletzt editiert von

#175

Except you yourself just stated that it was impossible to measure performance of these things. When it’s favorable to AI, you claim it can’t be measured. When it’s unfavorable for AI, you claim of course it’s measurable. Your argument is so flimsy and your understanding so limited that you can’t even stick to a single idea. You’re all over the place.
C 1 Antwort Letzte Antwort

1
S socialmediarefugee@lemmy.world

I've had good results being very specific, like "Generate some python 3 code for me that converts X to Y, recursively through all subdirectories, and converts the files in place."
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#176

I have been more successful with baby steps like: "Write a python 3 program that converts X to Y." Tweak prompt until that's working as desired, then: "make it work recursively through all subdirectories" - and again tweak with specifics like converting the files in place, etc. Always very specific, also - force it to fix its own bugs so you can move forward with a clean example as you add complexity. Complexity seems to cap out at a couple of pages of code, at which point "Ooops, something went wrong."
1 Antwort Letzte Antwort

2
Z zbyte64@awful.systems

A junior developer actually learns from doing the job, an LLM only learns when they update the training corpus and develop an updated model.
J This user is from outside of this forum
J This user is from outside of this forum
jumping_redditor@sh.itjust.works

schrieb zuletzt editiert von

#177

an llm costs less, and won't compain when yelled at
Z 1 Antwort Letzte Antwort

0
H honytawk@feddit.nl

The comparison is about the correctness of their work.

Their lives have nothing to do with it.
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von

#178

Human lives are the most important thing of all. Profits are irrelevant compared to human lives. I get that that's not how Besos sees the world, but he's a monstrous outlier.
1 Antwort Letzte Antwort

0
K knock_knock_lemmy_in@lemmy.world

Run something with a 70% failure rate 10x and you get to a cumulative 98% pass rate.
LLMs don't get tired and they can be run in parallel.
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von

#179

What's 0.7^10?
K 1 Antwort Letzte Antwort

0
M melvin_ferd@lemmy.world

Search AI in Lemmy and check out every article on it. It definitely is media spreading all the hate. And like this article is often some money yellow journalism
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von

#180

I think it's lemmy users. I see a lot more LLM skepticism here than in the news feeds.

In my experience, LLMs are like the laziest, shittiest know-nothing bozo forced to complete a task with zero attention to detail and zero care about whether it's crap, just doing enough to sound convincing.
M S 2 Antworten Letzte Antwort

1
U ulrich@feddit.org

That's not really helping though. The fact that you were transferred to them in the first place instead of directly to a human was an impediment.
E This user is from outside of this forum
E This user is from outside of this forum
eatcasserole@lemmy.world

schrieb zuletzt editiert von

#181

Oh absolutely, nothing was gained, time was wasted. My wording was too charitable.
1 Antwort Letzte Antwort

0
S surph_ninja@lemmy.world

Except you yourself just stated that it was impossible to measure performance of these things. When it’s favorable to AI, you claim it can’t be measured. When it’s unfavorable for AI, you claim of course it’s measurable. Your argument is so flimsy and your understanding so limited that you can’t even stick to a single idea. You’re all over the place.
C This user is from outside of this forum
C This user is from outside of this forum
chaonaut@lemmy.4d2.org

schrieb zuletzt editiert von

#182

It questionable to measure these things as being reflective of AI, because what AI is changes based on what piece of tech is being hawked as AI, because we're really bad at defining what intelligence is and isn't. You want to claim LLMs as AI? Go ahead, but you also adopt the problems of LLMs as the problems of AIs. Defining AI and thus its metrics is a moving target. When we can't agree to what is is, we can't agree to what it can do.
S 1 Antwort Letzte Antwort

0
E eli001@lemmy.world

This post did not contain any content.
S This user is from outside of this forum
S This user is from outside of this forum
szczuroarturo@programming.dev

schrieb zuletzt editiert von

#183

I actually have a fairly positive experience with ai ( copilot using claude specificaly ). Is it wrong a lot if you give it a huge task yes, so i dont do that and using as a very targeted solution if i am feeling very lazy today . Is it fast . Also not . I could actually be faster than ai in some cases.
But is it good if you are working for 6h and you just dont have enough mental capacity for the rest of the day. Yes . You can just prompt it specificaly enough to get desired result and just accept correct responses. Is it always good ,not really but good enough. Do i also suck after 3pm . Yes.
My main issue is actually the fact that it saves first and then asks you to pick if you want to use it. Not a problem usualy but if it crashes the generated code stays so that part sucks
W J 2 Antworten Letzte Antwort

7
S socialmediarefugee@lemmy.world

And let it suck up 10% or so of all of the power in the region.
A This user is from outside of this forum
A This user is from outside of this forum
austinfloyd@ttrpg.network

schrieb zuletzt editiert von

#184

And water
A 1 Antwort Letzte Antwort

10
C chaonaut@lemmy.4d2.org

It questionable to measure these things as being reflective of AI, because what AI is changes based on what piece of tech is being hawked as AI, because we're really bad at defining what intelligence is and isn't. You want to claim LLMs as AI? Go ahead, but you also adopt the problems of LLMs as the problems of AIs. Defining AI and thus its metrics is a moving target. When we can't agree to what is is, we can't agree to what it can do.
S This user is from outside of this forum
S This user is from outside of this forum
surph_ninja@lemmy.world

schrieb zuletzt editiert von

#185

Again, you only say it’s a moving target to dispel anything favorable towards AI. Then you do a complete 180 when it’s negative reporting on AI. Makes your argument meaningless, if you can’t even stick to your own point.
C 1 Antwort Letzte Antwort

0
J jumping_redditor@sh.itjust.works

an llm costs less, and won't compain when yelled at
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von

#186

Why would you ever yell at an employee unless you're bad at managing people? And you think you can manage an LLM better because it doesn't complain when you're obviously wrong?
1 Antwort Letzte Antwort

2
S surph_ninja@lemmy.world

Again, you only say it’s a moving target to dispel anything favorable towards AI. Then you do a complete 180 when it’s negative reporting on AI. Makes your argument meaningless, if you can’t even stick to your own point.
C This user is from outside of this forum
C This user is from outside of this forum
chaonaut@lemmy.4d2.org

schrieb zuletzt editiert von

#187

I mean, I argue that we aren't anywhere near AGI. Maybe we have a better chatbot and autocomplete than we did 20 years, but calling that AI? It doesn't really track, does it? With how bad they are at navigating novel situations? With how much time, energy and data it takes to eek out just a tiny bit more model fitness? Sure, these tools are pretty amazing for what they are, but general intelligences, they are not.
S 1 Antwort Letzte Antwort

1
M mangocats@feddit.it

Yes, but the test code "writes itself" - the path is clear, you just have to fill in the blanks.

Writing the proper product code in the first place, that's the valuable challenge.
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von zbyte64@awful.systems

#188

Maybe it is because I started out in QA, but I have to strongly disagree. You should assume the code doesn't work until proven otherwise, AI or not. Then when it doesn't work I find it is easier to debug you own code than someone else's and that includes AI.
M 1 Antwort Letzte Antwort

0
E eli001@lemmy.world

This post did not contain any content.
V This user is from outside of this forum
V This user is from outside of this forum
vanilla_puddinfudge@infosec.pub

schrieb zuletzt editiert von

#189

America: "Good enough to handle 911 calls!"
C D 2 Antworten Letzte Antwort

34
E eli001@lemmy.world

This post did not contain any content.
C This user is from outside of this forum
C This user is from outside of this forum
candymanager@lemmynsfw.com

schrieb zuletzt editiert von

#190

No shit.
1 Antwort Letzte Antwort

9
C chaonaut@lemmy.4d2.org

I mean, I argue that we aren't anywhere near AGI. Maybe we have a better chatbot and autocomplete than we did 20 years, but calling that AI? It doesn't really track, does it? With how bad they are at navigating novel situations? With how much time, energy and data it takes to eek out just a tiny bit more model fitness? Sure, these tools are pretty amazing for what they are, but general intelligences, they are not.
S This user is from outside of this forum
S This user is from outside of this forum
surph_ninja@lemmy.world

schrieb zuletzt editiert von

#191

No one’s claiming these are AGI. Again, you keep having to deflect to irrelevant arguments.
C 1 Antwort Letzte Antwort

0
K katana314@lemmy.world

I'm in a workplace that has tried not to be overbearing about AI, but has encouraged us to use them for coding.

I've tried to give mine some very simple tasks like writing a unit test just for the constructor of a class to verify current behavior, and it generates output that's both wrong and doesn't verify anything.

I'm aware it sometimes gets better with more intricate, specific instructions, and that I can offer it further corrections, but at that point it's not even saving time. I would do this with a human in the hopes that they would continue to retain the knowledge, but I don't even have hopes for AI to apply those lessons in new contexts. In a way, it's been a sigh of relief to realize just like Dotcom, just like 3D TVs, just like home smart assistants, it is a bubble.
R This user is from outside of this forum
R This user is from outside of this forum
ramenjunkie@midwest.social

schrieb zuletzt editiert von ramenjunkie@midwest.social

#192

I find its good at making simple Python scripts.

But also, as I evolve them, it starts randomly omitting previous functions. So it helps to k ow what you are doing at least a bit to catch that.
1 Antwort Letzte Antwort

2
S surph_ninja@lemmy.world

This is the same kind of short-sighted dismissal I see a lot in the religion vs science argument. When they hinge their pro-religion stance on the things science can’t explain, they’re defending an ever diminishing territory as science grows to explain more things. It’s a stupid strategy with an expiration date on your position.

All of the anti-AI positions, that hinge on the low quality or reliability of the output, are defending an increasingly diminished stance as the AI’s are further refined. And I simply don’t believe that the majority of the people making this argument actually care about the quality of the output. Even when it gets to the point of producing better output than humans across the board, these folks are still going to oppose it regardless. Why not just openly oppose it in general, instead of pinning your position to an argument that grows increasingly irrelevant by the day?

DeepSeek exposed the same issue with the anti-AI people dedicated to the environmental argument. We were shown proof that there’s significant progress in the development of efficient models, and it still didn’t change any of their minds. Because most of them don’t actually care about the environmental impacts. It’s just an anti-AI talking point that resonated with them.

The more baseless these anti-AI stances get, the more it seems to me that it’s a lot of people afraid of change and afraid of the fundamental economic shifts this will require, but they’re embarrassed or unable to articulate that stance. And it doesn’t help that the luddites haven’t been able to predict a single development. Just constantly flailing to craft a new argument to criticize the current models and tech. People are learning not to take these folks seriously.
R This user is from outside of this forum
R This user is from outside of this forum
ramenjunkie@midwest.social

schrieb zuletzt editiert von ramenjunkie@midwest.social

#193

Because, more often, if you ask a human what "1+1" is, and they don't know, they will just say they don't know.

AI will confidently insist its 3, and make up math algorythms to prove it.

And every company is pushing AI out on everyone like its always 10000% correct.

Its also shown its not intelligent. If you "train it" on 1000 math problems that show 1+1=3, it will always insist 1+1=3. It does not actually know how to add numbers, despite being a computer.
S 1 Antwort Letzte Antwort

4

Anmelden zum Antworten

P

A Win for Fair Use Is a Win for Libraries: Recent legal decision has reaffirmed the power of fair use in the digital age, and it’s a big win for libraries and the future of public access to knowledge
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
8

172 Stimmen

8 Beiträge

3 Aufrufe

S

I wouldn't go quite as far. This is just breacrumbs falling of the corporate table.
P

2000 LGBTQ+ activists to lawmakers & civil society orgs that support Trump censorship bills: stay home from pride
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
19

1

53 Stimmen

19 Beiträge

38 Aufrufe

Z

What is the technology angle here? What does this have to do with technology?
P

Millions of Americans Who Have Waited Decades for Fast Internet Connections Will Keep Waiting After the Trump Administration Threw a $42 Billion High-Speed Internet Program Into Disarray.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
83

1

834 Stimmen

83 Beiträge

14 Aufrufe

S

Which big companies lose money? Frontier or other companies? People switch where? To frontier or away from frontier? Who has faster internet? Frontier or frontier competitors? What does it matter that there are leftists and centrists in the state? How does this have anything to do with the comment u writing about?
I

OpenAI supremo Sam Altman says he 'doesn't know how' he would have taken care of his baby without the help of ChatGPT
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
77

1

272 Stimmen

77 Beiträge

23 Aufrufe

S

I don't believe the idea of aggregating information is bad, moreso the ability to properly vet your sources yourself. I don't know what sources an AI chatbot could be pulling from. It could be a lot of sources, or it could be one source. Does it know which sources are reliable? Not really. AI has been infamous for hallucinating even with simple prompts. Being able to independently check where your info comes from is an important part of stopping the spread of misinfo. AI can't do that, and, in it's current state, I wouldn't want it to try. Convenience is a rat race of cutting corners. What is convenient isn't always what is best in the long run.
R

Diego
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

8 Aufrufe

Niemand hat geantwortet
F

Samsung teams up with Glance to use your face in AI-generated lock screen ads
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

1

33 Stimmen

7 Beiträge

28 Aufrufe

C

AFAIK, you have the option to enable ads on your lock screen. It's not something that's forced upon you. Last time I took a look at the functionality, they "paid" you for the ads and you got to choose which charity to support with the money.
P

Digg founder Kevin Rose offers to buy Pocket from Mozilla
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

2

1 Stimmen

7 Beiträge

31 Aufrufe

H

IMO it was already shitty.
E

Microsoft pulls MS365 Business Premium from nonprofits
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
37

1

48 Stimmen

37 Beiträge

139 Aufrufe

S

That's the thing, I wish we could just switch all enterprises to Linux, but Microsoft developed a huge ecosystem that really does have good features. Unless something comparable comes up in the Linux world, I don't see Europe becoming independent of Microsoft any time soon