AI agents wrong ~70% of time: Carnegie Mellon study
-
I’m sorry as an AI I cannot physically color you shocked. I can help you with AWS services and questions.
How do I set up event driven document ingestion from OneDrive located on an Azure tenant to Amazon DocumentDB? Ingestion must be near-realtime, durable, and have some form of DLQ.
-
Are you guys sure. The media seems to be where a lot of LLM hate originates.
Whatever gets ad views
-
You get how that's fucking useless, generally?
yes, that's generally useless. It should not be shoved down people's throats. 30% accuracy still has its uses, especially if the result can be programmatically verified.
-
It doesn't matter if you need a human to review. AI has no way distinguishing between success and failure. Either way a human will have to review 100% of those tasks.
Right, so this is really only useful in cases where either it's vastly easier to verify an answer than posit one, or if a conventional program can verify the result of the AI's output.
-
How do I set up event driven document ingestion from OneDrive located on an Azure tenant to Amazon DocumentDB? Ingestion must be near-realtime, durable, and have some form of DLQ.
I see you mention Azure and will assume you’re doing a one time migration.
Start by moving everything from OneDrive to S3. As an AI I’m told that bitches love S3. From there you can subscribe to create events on buckets and add events to an SQS queue. Here you can enable a DLQ for failed events.
From there add a Lambda to listen for SQS events. You should enable provisioned concurrency for speed, the ability for AWS to bill you more, and so that you can have a dandy of a time figuring out why an old version of your lambda is still running even though you deployed the latest version and everything telling you that creating a new ID for the lambda each time to fix it fucking lies.
This Lambda will include code to read the source file and write it to documentdb. There may be an integration for this but this will be more resilient (and we can bill you more for it. )
Would you like to see sample CDK code? Tough shit because all I can do is assist with questions on AWS services.
-
yes, that's generally useless. It should not be shoved down people's throats. 30% accuracy still has its uses, especially if the result can be programmatically verified.
Less broadly useful than 20 tons of mixed texture human shit, and more ecologically devastatimg.
-
Less broadly useful than 20 tons of mixed texture human shit, and more ecologically devastatimg.
Are you just trolling or do you seriously not understand how something which can do a task correctly with 30% reliability can be made useful if the result can be automatically verified.
-
Are you just trolling or do you seriously not understand how something which can do a task correctly with 30% reliability can be made useful if the result can be automatically verified.
Its not a magical 30%, factors apply. It's not even a mind that thinks and just isnt very good.
This isnt like a magical dice that gives you truth on a 5 or a 6, and lies on 1,2,3,7, and for.
This is a (very complicated very large) language or other data graph that programmatically identifies an average. 30% of the time-according to one potempkin-ass demonstration.
Which means the more possible that is, the easier it is to either use a simpler cheaper tool that will give you a better more reliable answer much faster.And 20 tons of human shit has uses! If you know its providence, there's all sorts of population level public health surveillance you can do to get ahead of disease trends! Its also got some good agricultural stuff in it-phosphorous and stuff, if you can extract it.
Stop. Just please fucking stop glazing these NERVE-ass fascist shit-goblins.
-
Its not a magical 30%, factors apply. It's not even a mind that thinks and just isnt very good.
This isnt like a magical dice that gives you truth on a 5 or a 6, and lies on 1,2,3,7, and for.
This is a (very complicated very large) language or other data graph that programmatically identifies an average. 30% of the time-according to one potempkin-ass demonstration.
Which means the more possible that is, the easier it is to either use a simpler cheaper tool that will give you a better more reliable answer much faster.And 20 tons of human shit has uses! If you know its providence, there's all sorts of population level public health surveillance you can do to get ahead of disease trends! Its also got some good agricultural stuff in it-phosphorous and stuff, if you can extract it.
Stop. Just please fucking stop glazing these NERVE-ass fascist shit-goblins.
I think everyone in the universe is aware of how LLMs work by now, you don't need to explain it to someone just because they think LLMs are more useful than you do.
IDK what you mean by glazing but if by "glaze" you mean "understanding the potential threat of AI to society instead of hiding under a rock and pretending it's as useless as a plastic radio," then no, I won't stop.
-
Ok what about tech journalists who produced articles with those misunderstandings. Surely they know better yet still produce articles like this. But also people who care enough about this topic to post these articles usually I assume know better yet still spread this crap
I liked when the Chicago Sun-Times put out a summer reading list and only a third of the books on it were real. Each book had a summary of the plot next to it too. They later apologized for it.
-
So no different than answers from middle management I guess?
This basically the entirety of the hype from the group of people claiming LLMs are going take over the work force. Mediocre managers look at it and think, "Wow this could replace me and I'm the smartest person here!"
Sure, Jan.
-
I think everyone in the universe is aware of how LLMs work by now, you don't need to explain it to someone just because they think LLMs are more useful than you do.
IDK what you mean by glazing but if by "glaze" you mean "understanding the potential threat of AI to society instead of hiding under a rock and pretending it's as useless as a plastic radio," then no, I won't stop.
It's absolutely dangerous but it doesnt have to work even a little to do damage; hell, it already has. Your thing just makes it sound much more capable than it is. And it is not.
Also, it's not AI.
-
This post did not contain any content.
"...for multi-step tasks"
-
The ones being implemented into emergency call centers are better though? Right?
i wonder how the evil palintir uses its AI.
-
This basically the entirety of the hype from the group of people claiming LLMs are going take over the work force. Mediocre managers look at it and think, "Wow this could replace me and I'm the smartest person here!"
Sure, Jan.
I won't tolerate Jan slander here. I know he's just a builder, but his life path has the most probability of having a great person out of it!
-
This post did not contain any content.
For me as a software developer the accuracy is more in the 95%+ range.
On one hand the built in copilot chat widget in Intellij basically replaces a lot my google queries.
On the other hand it is rather fucking good at executing some rewrites that is a fucking chore to do manually, but can easily be done by copilot.
Imagine you have a script that initializes your DB with some test data. You have an Insert into statement with lots of columns and rows so
Inser into (column1,....,column n)
Values row1,
Row 2
Row nAddig a new column with test data for each row is a PITA, but copilot handles it without issue.
Similarly when writing unit tests you do a lot of edge case testing which is a bunch of almost same looking tests with maybe one variable changing, at most you write one of those tests, then copilot will auto generate the rest after you name the next unit test, pretty good at guessing what you want to do in that test, at least with my naming scheme.
So yeah, it's way overrated for many-many things, but for programming it's a pretty awesome productivity tool.
-
Did you make it? Or did you prompt it? They ain't quite the same.
It calls ollama with a prompt, it's a bit complex because it renames and moves stuff too and sorts it.
-
It's absolutely dangerous but it doesnt have to work even a little to do damage; hell, it already has. Your thing just makes it sound much more capable than it is. And it is not.
Also, it's not AI.
semantics.
-
semantics.
No, it matters. Youre pushing the lie they want pushed.
-
-
Iran’s internet blackout left people in the dark. How does a country shut down the internet?
Technology1
-
Homeland Security Warns about the Spike in China-Based Technology Firms’ Smuggling of Signal Jammers
Technology1
-
-
'Fortnite' Lobbies Can Now Have Up to 92% Bots - Players Are Furious Over Supposed OG Season 3 Update
Technology1
-
-
-