AI agents wrong ~70% of time: Carnegie Mellon study
-
Ok what about tech journalists who produced articles with those misunderstandings. Surely they know better yet still produce articles like this. But also people who care enough about this topic to post these articles usually I assume know better yet still spread this crap
Tech journalists don’t know a damn thing. They’re people that liked computers and could also bullshit an essay in college. That doesn’t make them an expert on anything.
-
I called my local HVAC company recently. They switched to an AI operator. All I wanted was to schedule someone to come out and look at my system. It could not schedule an appointment. Like if you can't perform the simplest of tasks, what are you even doing? Other than acting obnoxiously excited to receive a phone call?
I've had to deal with a couple of these "AI" customer service thingies. The only helpful thing I've been able to get them to do is refer me to a human.
-
Exactly. Vibe coding is bad, but generating code for something you don't touch often but can absolutely understand is totally fine. I've used it to generate SQL queries for relatively odd cases, such as CTEs for improving performance for large queries with common sub-queries. I always forget the syntax since I only do it like once/year, and LLMs are great at generating something reasonable that I can tweak for my tables.
I always forget the syntax
Me with literally everything code I touch always and forever.
-
I've had to deal with a couple of these "AI" customer service thingies. The only helpful thing I've been able to get them to do is refer me to a human.
That's not really helping though. The fact that you were transferred to them in the first place instead of directly to a human was an impediment.
-
LLMs are like a multitool, they can do lots of easy things mostly fine as long as it is not complicated and doesn't need to be exactly right. But they are being promoted as a whole toolkit as if they are able to be used to do the same work as effectively as a hammer, power drill, table saw, vise, and wrench.
and doesn't need to be exactly right
What kind of tasks do you consider that don't need to be exactly right?
-
Agents work better when you include that the accuracy of the work is life or death for some reason. I've made a little script that gives me bibtex for a folder of pdfs and this is how I got it to be usable.
Did you make it? Or did you prompt it? They ain't quite the same.
-
This post did not contain any content.
So no different than answers from middle management I guess?
-
Tech journalists don’t know a damn thing. They’re people that liked computers and could also bullshit an essay in college. That doesn’t make them an expert on anything.
... And nowadays they let the LLM help with the bullshittery
-
So no different than answers from middle management I guess?
At least AI won't fire you.
-
At least AI won't fire you.
Idk the new iterations might just. Shit Amazon alreadys uses automated systems to fire people.
-
This post did not contain any content.
I'd just like to point out that, from the perspective of somebody watching AI develop for the past 10 years, completing 30% of automated tasks successfully is pretty good! Ten years ago they could not do this at all. Overlooking all the other issues with AI, I think we are all irritated with the AI hype people for saying things like they can be right 100% of the time -- Amazon's new CEO actually said they would be able to achieve 100% accuracy this year, lmao. But being able to do 30% of tasks successfully is already useful.
-
and doesn't need to be exactly right
What kind of tasks do you consider that don't need to be exactly right?
Make a basic HTML template. I'll be changing it up anyway.
-
I'd just like to point out that, from the perspective of somebody watching AI develop for the past 10 years, completing 30% of automated tasks successfully is pretty good! Ten years ago they could not do this at all. Overlooking all the other issues with AI, I think we are all irritated with the AI hype people for saying things like they can be right 100% of the time -- Amazon's new CEO actually said they would be able to achieve 100% accuracy this year, lmao. But being able to do 30% of tasks successfully is already useful.
Please stop.
-
This post did not contain any content.
Color me surprised
-
and doesn't need to be exactly right
What kind of tasks do you consider that don't need to be exactly right?
Things that are inspiration or for approximations. Layout examples, possible correlations between data sets that need coincidence to be filtered out, estimating time lines, and basically anything that is close enough for a human to take the output and then do something with it.
For example, if you put in a list of ingredients it can spit out recipes that may or may not be what you want, but it can be an inspiration. Taking the output and cooking without any review and consideration would be risky.
-
Ok what about tech journalists who produced articles with those misunderstandings. Surely they know better yet still produce articles like this. But also people who care enough about this topic to post these articles usually I assume know better yet still spread this crap
Check out Ed Zitron's angry reporting on Tech journalists fawning over this garbage and reporting on it uncritically. He has a newsletter and a podcast.
-
At least AI won't fire you.
It kinda does when you ask it something it doesn't like.
-
and doesn't need to be exactly right
What kind of tasks do you consider that don't need to be exactly right?
Most. I've used ChatGPT to sketch an outline of a document, reformulate accomplishments into review bullets, rephrase a task I didnt understand, and similar stuff. None of it needed to be anywhere near perfect or complete.
Edit: and my favorite, "what's the word for..."
-
Please stop.
I'm not claiming that the use of AI is ethical. If you want to fight back you have to take it seriously though.
-
I'm not claiming that the use of AI is ethical. If you want to fight back you have to take it seriously though.
It cant do 30% of tasks vorrectly. It can do tasks correctly as much as 30% of the time, and since it's llm shit you know those numbers have been more massaged than any human in history has ever been.