linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI agents wrong ~70% of time: Carnegie Mellon study

Technology

277 Beiträge 108 Kommentatoren 90 Aufrufe

M melvin_ferd@lemmy.world

Search AI in Lemmy and check out every article on it. It definitely is media spreading all the hate. And like this article is often some money yellow journalism
D This user is from outside of this forum
D This user is from outside of this forum
davidagain@lemmy.world

schrieb zuletzt editiert von

#180

I think it's lemmy users. I see a lot more LLM skepticism here than in the news feeds.

In my experience, LLMs are like the laziest, shittiest know-nothing bozo forced to complete a task with zero attention to detail and zero care about whether it's crap, just doing enough to sound convincing.
M S 2 Antworten Letzte Antwort

1
U ulrich@feddit.org

That's not really helping though. The fact that you were transferred to them in the first place instead of directly to a human was an impediment.
E This user is from outside of this forum
E This user is from outside of this forum
eatcasserole@lemmy.world

schrieb zuletzt editiert von

#181

Oh absolutely, nothing was gained, time was wasted. My wording was too charitable.
1 Antwort Letzte Antwort

0
S surph_ninja@lemmy.world

Except you yourself just stated that it was impossible to measure performance of these things. When it’s favorable to AI, you claim it can’t be measured. When it’s unfavorable for AI, you claim of course it’s measurable. Your argument is so flimsy and your understanding so limited that you can’t even stick to a single idea. You’re all over the place.
C This user is from outside of this forum
C This user is from outside of this forum
chaonaut@lemmy.4d2.org

schrieb zuletzt editiert von

#182

It questionable to measure these things as being reflective of AI, because what AI is changes based on what piece of tech is being hawked as AI, because we're really bad at defining what intelligence is and isn't. You want to claim LLMs as AI? Go ahead, but you also adopt the problems of LLMs as the problems of AIs. Defining AI and thus its metrics is a moving target. When we can't agree to what is is, we can't agree to what it can do.
S 1 Antwort Letzte Antwort

0
E eli001@lemmy.world

This post did not contain any content.
S This user is from outside of this forum
S This user is from outside of this forum
szczuroarturo@programming.dev

schrieb zuletzt editiert von

#183

I actually have a fairly positive experience with ai ( copilot using claude specificaly ). Is it wrong a lot if you give it a huge task yes, so i dont do that and using as a very targeted solution if i am feeling very lazy today . Is it fast . Also not . I could actually be faster than ai in some cases.
But is it good if you are working for 6h and you just dont have enough mental capacity for the rest of the day. Yes . You can just prompt it specificaly enough to get desired result and just accept correct responses. Is it always good ,not really but good enough. Do i also suck after 3pm . Yes.
My main issue is actually the fact that it saves first and then asks you to pick if you want to use it. Not a problem usualy but if it crashes the generated code stays so that part sucks
W J 2 Antworten Letzte Antwort

7
S socialmediarefugee@lemmy.world

And let it suck up 10% or so of all of the power in the region.
A This user is from outside of this forum
A This user is from outside of this forum
austinfloyd@ttrpg.network

schrieb zuletzt editiert von

#184

And water
A 1 Antwort Letzte Antwort

10
C chaonaut@lemmy.4d2.org

It questionable to measure these things as being reflective of AI, because what AI is changes based on what piece of tech is being hawked as AI, because we're really bad at defining what intelligence is and isn't. You want to claim LLMs as AI? Go ahead, but you also adopt the problems of LLMs as the problems of AIs. Defining AI and thus its metrics is a moving target. When we can't agree to what is is, we can't agree to what it can do.
S This user is from outside of this forum
S This user is from outside of this forum
surph_ninja@lemmy.world

schrieb zuletzt editiert von

#185

Again, you only say it’s a moving target to dispel anything favorable towards AI. Then you do a complete 180 when it’s negative reporting on AI. Makes your argument meaningless, if you can’t even stick to your own point.
C 1 Antwort Letzte Antwort

0
J jumping_redditor@sh.itjust.works

an llm costs less, and won't compain when yelled at
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von

#186

Why would you ever yell at an employee unless you're bad at managing people? And you think you can manage an LLM better because it doesn't complain when you're obviously wrong?
1 Antwort Letzte Antwort

2
S surph_ninja@lemmy.world

Again, you only say it’s a moving target to dispel anything favorable towards AI. Then you do a complete 180 when it’s negative reporting on AI. Makes your argument meaningless, if you can’t even stick to your own point.
C This user is from outside of this forum
C This user is from outside of this forum
chaonaut@lemmy.4d2.org

schrieb zuletzt editiert von

#187

I mean, I argue that we aren't anywhere near AGI. Maybe we have a better chatbot and autocomplete than we did 20 years, but calling that AI? It doesn't really track, does it? With how bad they are at navigating novel situations? With how much time, energy and data it takes to eek out just a tiny bit more model fitness? Sure, these tools are pretty amazing for what they are, but general intelligences, they are not.
S 1 Antwort Letzte Antwort

1
M mangocats@feddit.it

Yes, but the test code "writes itself" - the path is clear, you just have to fill in the blanks.

Writing the proper product code in the first place, that's the valuable challenge.
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von zbyte64@awful.systems

#188

Maybe it is because I started out in QA, but I have to strongly disagree. You should assume the code doesn't work until proven otherwise, AI or not. Then when it doesn't work I find it is easier to debug you own code than someone else's and that includes AI.
M 1 Antwort Letzte Antwort

0
E eli001@lemmy.world

This post did not contain any content.
V This user is from outside of this forum
V This user is from outside of this forum
vanilla_puddinfudge@infosec.pub

schrieb zuletzt editiert von

#189

America: "Good enough to handle 911 calls!"
C D 2 Antworten Letzte Antwort

34
E eli001@lemmy.world

This post did not contain any content.
C This user is from outside of this forum
C This user is from outside of this forum
candymanager@lemmynsfw.com

schrieb zuletzt editiert von

#190

No shit.
1 Antwort Letzte Antwort

9
C chaonaut@lemmy.4d2.org

I mean, I argue that we aren't anywhere near AGI. Maybe we have a better chatbot and autocomplete than we did 20 years, but calling that AI? It doesn't really track, does it? With how bad they are at navigating novel situations? With how much time, energy and data it takes to eek out just a tiny bit more model fitness? Sure, these tools are pretty amazing for what they are, but general intelligences, they are not.
S This user is from outside of this forum
S This user is from outside of this forum
surph_ninja@lemmy.world

schrieb zuletzt editiert von

#191

No one’s claiming these are AGI. Again, you keep having to deflect to irrelevant arguments.
C 1 Antwort Letzte Antwort

0
K katana314@lemmy.world

I'm in a workplace that has tried not to be overbearing about AI, but has encouraged us to use them for coding.

I've tried to give mine some very simple tasks like writing a unit test just for the constructor of a class to verify current behavior, and it generates output that's both wrong and doesn't verify anything.

I'm aware it sometimes gets better with more intricate, specific instructions, and that I can offer it further corrections, but at that point it's not even saving time. I would do this with a human in the hopes that they would continue to retain the knowledge, but I don't even have hopes for AI to apply those lessons in new contexts. In a way, it's been a sigh of relief to realize just like Dotcom, just like 3D TVs, just like home smart assistants, it is a bubble.
R This user is from outside of this forum
R This user is from outside of this forum
ramenjunkie@midwest.social

schrieb zuletzt editiert von ramenjunkie@midwest.social

#192

I find its good at making simple Python scripts.

But also, as I evolve them, it starts randomly omitting previous functions. So it helps to k ow what you are doing at least a bit to catch that.
1 Antwort Letzte Antwort

2
S surph_ninja@lemmy.world

This is the same kind of short-sighted dismissal I see a lot in the religion vs science argument. When they hinge their pro-religion stance on the things science can’t explain, they’re defending an ever diminishing territory as science grows to explain more things. It’s a stupid strategy with an expiration date on your position.

All of the anti-AI positions, that hinge on the low quality or reliability of the output, are defending an increasingly diminished stance as the AI’s are further refined. And I simply don’t believe that the majority of the people making this argument actually care about the quality of the output. Even when it gets to the point of producing better output than humans across the board, these folks are still going to oppose it regardless. Why not just openly oppose it in general, instead of pinning your position to an argument that grows increasingly irrelevant by the day?

DeepSeek exposed the same issue with the anti-AI people dedicated to the environmental argument. We were shown proof that there’s significant progress in the development of efficient models, and it still didn’t change any of their minds. Because most of them don’t actually care about the environmental impacts. It’s just an anti-AI talking point that resonated with them.

The more baseless these anti-AI stances get, the more it seems to me that it’s a lot of people afraid of change and afraid of the fundamental economic shifts this will require, but they’re embarrassed or unable to articulate that stance. And it doesn’t help that the luddites haven’t been able to predict a single development. Just constantly flailing to craft a new argument to criticize the current models and tech. People are learning not to take these folks seriously.
R This user is from outside of this forum
R This user is from outside of this forum
ramenjunkie@midwest.social

schrieb zuletzt editiert von ramenjunkie@midwest.social

#193

Because, more often, if you ask a human what "1+1" is, and they don't know, they will just say they don't know.

AI will confidently insist its 3, and make up math algorythms to prove it.

And every company is pushing AI out on everyone like its always 10000% correct.

Its also shown its not intelligent. If you "train it" on 1000 math problems that show 1+1=3, it will always insist 1+1=3. It does not actually know how to add numbers, despite being a computer.
S 1 Antwort Letzte Antwort

4
H honytawk@feddit.nl

The comparison is about the correctness of their work.

Their lives have nothing to do with it.
O This user is from outside of this forum
O This user is from outside of this forum
outhouseperilous@lemmy.dbzer0.com

schrieb zuletzt editiert von

#194

So, first, bad comparison.

Second: if that's the equivalent, why not do the one that makes tge wealthy let a few pennies go to fall on actual people?
1 Antwort Letzte Antwort

0
F fogetaboutit@programming.dev

please bro just one hundred more GPU and one more billion dollars of research, we make it good please bro
J This user is from outside of this forum
J This user is from outside of this forum
jj4211@lemmy.world

schrieb zuletzt editiert von

#195

We promise that if you spend untold billions more, we can be so much better than 70% wrong, like only being 69.9% wrong.
W 1 Antwort Letzte Antwort

20
D davidagain@lemmy.world

What's 0.7^10?
K This user is from outside of this forum
K This user is from outside of this forum
knock_knock_lemmy_in@lemmy.world

schrieb zuletzt editiert von

#196

About 0.02
D 1 Antwort Letzte Antwort

1
L loonsun@sh.itjust.works

It's about Agents, which implies multi step as those are meant to execute a series of tasks opposed to studies looking at base LLM model performance.
R This user is from outside of this forum
R This user is from outside of this forum
ramenjunkie@midwest.social

schrieb zuletzt editiert von

#197

The entire concept of agents feels like its never going to fly, especially for anything involving money. I am not going to tell and AI I want to bake a cake and trust that will find the correct ingredients at the right price and the door dash them to me.
1 Antwort Letzte Antwort

0
J jsomae@lemmy.ml

Hitler liked to paint, doesn't make painting wrong. The fact that big tech is pushing AI isn't evidence against the utility of AI.

That common parlance is to call machine learning "AI" these days doesn't matter to me in the slightest. Do you have a definition of "intelligence"? Do you object when pathfinding is called AI? Or STRIPS? Or bots in a video game? Dare I say it, the main difference between those AIs and LLMs is their generality -- so why not just call it GAI at this point tbh. This is a question of semantics so it really doesn't matter to the deeper question. Doesn't matter if you call it AI or not, LLMs work the same way either way.
O This user is from outside of this forum
O This user is from outside of this forum
outhouseperilous@lemmy.dbzer0.com

schrieb zuletzt editiert von

#198

Semantics, of course, famously never matter.
1 Antwort Letzte Antwort

0
J jsomae@lemmy.ml

The problem is they are not i.i.d., so this doesn't really work. It works a bit, which is in my opinion why chain-of-thought is effective (it gives the LLM a chance to posit a couple answers first). However, we're already looking at "agents," so they're probably already doing chain-of-thought.
K This user is from outside of this forum
K This user is from outside of this forum
knock_knock_lemmy_in@lemmy.world

schrieb zuletzt editiert von

#199

Very fair comment. In my experience even increasing the temperature you get stuck in local minimums

I was just trying to illustrate how 70% failure rates can still be useful.
1 Antwort Letzte Antwort

2

Anmelden zum Antworten

R

Something I noticed
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

3 Stimmen

2 Beiträge

18 Aufrufe

H

This would be better suited in some casual ranting community. Or one concerned with tech bros. I think it's completely off topic here.
A

Never run out of content again. Mojo Video generates unlimited original videos with a single click.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

9 Aufrufe

Niemand hat geantwortet
T

SpaceX's Starship blows up ahead of 10th test flight
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
165

1

610 Stimmen

165 Beiträge

330 Aufrufe

M

In this case you happen to be right on both counts.
P

Salt Lake City, plans to implement AI-assisted 911 call triaging to handle ~30% of about 450K non-emergency calls per year
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
143

1

404 Stimmen

143 Beiträge

129 Aufrufe

M

If anyone ever tells you they can't hire enough of blank they are lying to you. People have been running excellent 911 service all over the country for longer than I've been alive maybe they should ask someone?
P

The Meta AI app is a privacy disaster: Meta's AI App ‘Discover’ Feed Publicly Exposes Private Chats Without Users Knowing.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

1 Stimmen

1 Beiträge

3 Aufrufe

Niemand hat geantwortet
A

'Fortnite' Lobbies Can Now Have Up to 92% Bots - Players Are Furious Over Supposed OG Season 3 Update
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
23

1

151 Stimmen

23 Beiträge

69 Aufrufe

D

I played around the launch and didn't realize there were bots (outside of pve)... But I also assumed I was shooting a bunch of kids that barely understood the controls.
A

Massaging the neck and face may help flush waste out of the brain
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
25

1

237 Stimmen

25 Beiträge

109 Aufrufe

D

Segue into sexy time
E

Whatever happened to cheap eReaders? – Terence Eden’s Blog
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
72

1

125 Stimmen

72 Beiträge

210 Aufrufe

T

This is a weirdly aggressive take without considering variables. Almost petulant seeming. 6” readers are relatively cheap no matter the brand, but cost goes up with size. $250 to $300 is what a 7.8” or 8” reader costs, but there’s not a single one I know of at 6” at that price. There’s 10” and 13” models. Are you saying they should cost the same as a Kindle? Not to mention, regarding Kindle, Amazon spent years building the brand but selling either at cost or possibly even taking a loss on the devices as they make money on the book sales. Companies who can’t do that tend to charge more. Lastly, it’s not “feature creep” to improve the devices over time, many changes are quality of life. Larger displays for those that want them. Frontlit displays, and later the addition of warm lighting. Displays essentially doubled their resolution allowing for crisper fonts and custom fonts to render well. Higher contrast displays with darker blacks for text. More recently color displays as an option. This is all progress, but it’s not free. Also, inflation is a thing and generally happens at a rate of 2% to 3% annually or thereabouts during “normal” times, and we’ve hardly been living in normal times over the last decade and a half.