Skip to content

AI agents wrong ~70% of time: Carnegie Mellon study

Technology
277 108 90
  • that is such a ridiculous idea. Just because you see hate for it in the media doesn't mean it originated there. I'll have you know that i have embarrassed myself by screaming at robot phone receptionists for years now. stupid fuckers pretending to be people but not knowing shit. I was born ready to hate LLMs and I'm not gonna have you claim that CNN made me do it.

    Search AI in Lemmy and check out every article on it. It definitely is media spreading all the hate. And like this article is often some money yellow journalism

  • I'm in a workplace that has tried not to be overbearing about AI, but has encouraged us to use them for coding.

    I've tried to give mine some very simple tasks like writing a unit test just for the constructor of a class to verify current behavior, and it generates output that's both wrong and doesn't verify anything.

    I'm aware it sometimes gets better with more intricate, specific instructions, and that I can offer it further corrections, but at that point it's not even saving time. I would do this with a human in the hopes that they would continue to retain the knowledge, but I don't even have hopes for AI to apply those lessons in new contexts. In a way, it's been a sigh of relief to realize just like Dotcom, just like 3D TVs, just like home smart assistants, it is a bubble.

    The first half dozen times I tried AI for code, across the past year or so, it failed pretty much as you describe.

    Finally, I hit on some things it can do. For me: keeping the instructions more general, not specifying certain libraries for instance, was the key to getting something that actually does something. Also, if it doesn't show you the whole program, get it to show you the whole thing, and make it fix its own mistakes so you can build on working code with later requests.

  • No, it matters. Youre pushing the lie they want pushed.

    And you're pushing a hate train with no aspect of nuance to show for it.

    Seems like you are even less than 30% useful. And that is mainly because you can be used as fertilizer.

  • and doesn't need to be exactly right

    What kind of tasks do you consider that don't need to be exactly right?

    Description generators for TTRPGs, as you will read through them afterwards anyway and correct when necessary.

    Generating lists of ideas. For creative writing, getting a bunch of ideas you can pick and choose from that fit the narrative you want.

    A search engine like Perplexity.ai which after searching summarizes the web page and adds a link to the page next to it. If the summary seems promising, you go to the real page to verify the actual information.

    Simple code like HTML pages and boilerplate code that you will still review afterwards anyway.

  • When LLMs get it right it's because they're summarizing a stack overflow or GitHub snippet it was trained on. But you loose all the benefits of other humans commenting on the context, pitfalls and other alternatives.

    You mean things you had to do anyway even if you didn't use LLMs?

  • That’s literally how “AI agents” are being marketed. “Tell it to do a thing and it will do it for you.”

    So? That doesn't mean they are supposed to be used like that.

    Show me any marketing that isn't full of lies.

  • The first half dozen times I tried AI for code, across the past year or so, it failed pretty much as you describe.

    Finally, I hit on some things it can do. For me: keeping the instructions more general, not specifying certain libraries for instance, was the key to getting something that actually does something. Also, if it doesn't show you the whole program, get it to show you the whole thing, and make it fix its own mistakes so you can build on working code with later requests.

    Have you tried insulting the AI in the system prompt (as well as other tunes to the system prompt)?

    I'm not joking, it really works

    For example:

    Instead of "You are an intelligent coding assistant..."

    "You are an absolute fucking idiot who can barely code..."

  • Emotion > Facts. Most people have been trained to blindly accept things and cheer on what fits with their agenda. Like technbro's exaggerating LLMs, or people like you misrepresenting LLMs as mere statistical word generators without intelligence. That's like saying a computer is just wires and switches, or missing the forest for the trees. Both is equally false.

    Yet if it fits with the emotional needs or with dogma, then other will agree. It's a convenient and comforting "A vs B" worldview we've been trained to accept. And so the satisfying notion and misinformation keeps spreading.

    LLMs tell us more about human intelligence and the human slop we've been generating. It tells us that most people are not that much more than statistical word generators.

    people like you misrepresenting LLMs as mere statistical word generators without intelligence.

    You've bought-in to the hype. I won't try to argue with you because you aren't cognizent of reality.

  • This post did not contain any content.

    We have created the overconfident intern in digital form.

  • When LLMs get it right it's because they're summarizing a stack overflow or GitHub snippet it was trained on. But you loose all the benefits of other humans commenting on the context, pitfalls and other alternatives.

    You’re not wrong, but often I’m just trying to do something I’ve done a thousand times before and I already know the pitfalls. Also, I’m sure I’ve copied code from stackoverflow before.

  • This post did not contain any content.

    Hey I went there

  • people like you misrepresenting LLMs as mere statistical word generators without intelligence.

    You've bought-in to the hype. I won't try to argue with you because you aren't cognizent of reality.

    You're projecting. Every accusation is a confession.

  • Have you tried insulting the AI in the system prompt (as well as other tunes to the system prompt)?

    I'm not joking, it really works

    For example:

    Instead of "You are an intelligent coding assistant..."

    "You are an absolute fucking idiot who can barely code..."

    “You are an absolute fucking idiot who can barely code…”

    Honestly, that's what you have to do. It's the only way I can get through using Claude.ai. I treat it like it's an absolute moron, I insult it, I "yell" at it, I threaten it and guess what? the solutions have gotten better. not great but a hell of a lot better than what they used to be. It really works. it forces it to really think through the problem, research solutions, cite sources, etc. I have even told it i'll cancel my subscription to it if it gets it wrong.

    no more "do this and this and then this but do this first and then do this" after calling it a "fucking moron" and what have you it will provide an answer and just say "done."

  • “You are an absolute fucking idiot who can barely code…”

    Honestly, that's what you have to do. It's the only way I can get through using Claude.ai. I treat it like it's an absolute moron, I insult it, I "yell" at it, I threaten it and guess what? the solutions have gotten better. not great but a hell of a lot better than what they used to be. It really works. it forces it to really think through the problem, research solutions, cite sources, etc. I have even told it i'll cancel my subscription to it if it gets it wrong.

    no more "do this and this and then this but do this first and then do this" after calling it a "fucking moron" and what have you it will provide an answer and just say "done."

    This guy is the moral lesson at the start of the apocalypse movie

  • This post did not contain any content.

    This is the same kind of short-sighted dismissal I see a lot in the religion vs science argument. When they hinge their pro-religion stance on the things science can’t explain, they’re defending an ever diminishing territory as science grows to explain more things. It’s a stupid strategy with an expiration date on your position.

    All of the anti-AI positions, that hinge on the low quality or reliability of the output, are defending an increasingly diminished stance as the AI’s are further refined. And I simply don’t believe that the majority of the people making this argument actually care about the quality of the output. Even when it gets to the point of producing better output than humans across the board, these folks are still going to oppose it regardless. Why not just openly oppose it in general, instead of pinning your position to an argument that grows increasingly irrelevant by the day?

    DeepSeek exposed the same issue with the anti-AI people dedicated to the environmental argument. We were shown proof that there’s significant progress in the development of efficient models, and it still didn’t change any of their minds. Because most of them don’t actually care about the environmental impacts. It’s just an anti-AI talking point that resonated with them.

    The more baseless these anti-AI stances get, the more it seems to me that it’s a lot of people afraid of change and afraid of the fundamental economic shifts this will require, but they’re embarrassed or unable to articulate that stance. And it doesn’t help that the luddites haven’t been able to predict a single development. Just constantly flailing to craft a new argument to criticize the current models and tech. People are learning not to take these folks seriously.

  • Have you tried insulting the AI in the system prompt (as well as other tunes to the system prompt)?

    I'm not joking, it really works

    For example:

    Instead of "You are an intelligent coding assistant..."

    "You are an absolute fucking idiot who can barely code..."

    I frequently find myself prompting it: "now show me the whole program with all the errors corrected." Sometimes I have to ask that two or three times, different ways, before it coughs up the next iteration ready to copy-paste-test. Most times when it gives errors I'll just write "address: " and copy-paste the error message in - frequently the text of the AI response will apologize, less frequently it will actually fix the error.

  • This guy is the moral lesson at the start of the apocalypse movie

    He's developing a toxic relationship with his AI agent. I don't think it's the best way to get what you want (demonstrating how to be abusive to the AI), but maybe it's the only method he is capable of getting results with.

  • This is the same kind of short-sighted dismissal I see a lot in the religion vs science argument. When they hinge their pro-religion stance on the things science can’t explain, they’re defending an ever diminishing territory as science grows to explain more things. It’s a stupid strategy with an expiration date on your position.

    All of the anti-AI positions, that hinge on the low quality or reliability of the output, are defending an increasingly diminished stance as the AI’s are further refined. And I simply don’t believe that the majority of the people making this argument actually care about the quality of the output. Even when it gets to the point of producing better output than humans across the board, these folks are still going to oppose it regardless. Why not just openly oppose it in general, instead of pinning your position to an argument that grows increasingly irrelevant by the day?

    DeepSeek exposed the same issue with the anti-AI people dedicated to the environmental argument. We were shown proof that there’s significant progress in the development of efficient models, and it still didn’t change any of their minds. Because most of them don’t actually care about the environmental impacts. It’s just an anti-AI talking point that resonated with them.

    The more baseless these anti-AI stances get, the more it seems to me that it’s a lot of people afraid of change and afraid of the fundamental economic shifts this will require, but they’re embarrassed or unable to articulate that stance. And it doesn’t help that the luddites haven’t been able to predict a single development. Just constantly flailing to craft a new argument to criticize the current models and tech. People are learning not to take these folks seriously.

    Maybe the marketers should be a bit more picky about what they slap "AI" on and maybe decision makers should be a little less eager to follow whatever Better Auto complete spits out, but maybe that's just me and we really should be pretending that all these algorithms really have made humans obsolete and generating convincing language is better than correspondence with reality.

  • Maybe the marketers should be a bit more picky about what they slap "AI" on and maybe decision makers should be a little less eager to follow whatever Better Auto complete spits out, but maybe that's just me and we really should be pretending that all these algorithms really have made humans obsolete and generating convincing language is better than correspondence with reality.

    I’m not sure the anti-AI marketing stance is any more solid of a position. Though it’s probably easier to defend, since it’s so vague and not based on anything measurable.

  • I’m not sure the anti-AI marketing stance is any more solid of a position. Though it’s probably easier to defend, since it’s so vague and not based on anything measurable.

    Calling AI measurable is somewhat unfounded. Between not having a coherent, agreed-upon definition of what does and does not constitute an AI (we are, after all, discussing LLMs as though they were AGI), and the difficulty that exists in discussing the qualifications of human intelligence, saying that a given metric covers how well a thing is an AI isn't really founded on anything but preference. We could, for example, say that mathematical ability is indicative of intelligence, but claiming FLOPS is a proxy for intelligence falls rather flat. We can measure things about the various algorithms, but that's an awful long ways off from talking about AI itself (unless we've bought into the marketing hype).

  • 349 Stimmen
    72 Beiträge
    207 Aufrufe
    M
    Sure, the internet is more practical, and the odds of being caught in the time required to execute a decent strike plan, even one as vague as: "we're going to Amerika and we're going to hit 50 high profile targets on July 4th, one in every state" (Dear NSA analyst, this is entirely hypothetical) so your agents spread to the field and start assessing from the ground the highest impact targets attainable with their resources, extensive back and forth from the field to central command daily for 90 days of prep, but it's being carried out on 270 different active social media channels as innocuous looking photo exchanges with 540 pre-arranged algorithms hiding the messages in the noise of the image bits. Chances of security agencies picking this up from the communication itself? About 100x less than them noticing 50 teams of activists deployed to 50 states at roughly the same time, even if they never communicate anything. HF (more often called shortwave) is well suited for the numbers game. A deep cover agent lying in wait, potentially for years. Only "tell" is their odd habit of listening to the radio most nights. All they're waiting for is a binary message: if you hear the sequence 3 17 22 you are to make contact for further instructions. That message may come at any time, or may not come for a decade. These days, you would make your contact for further instructions via internet, and sure, it would be more practical to hide the "make contact" signal in the internet too, but shortwave is a longstanding tech with known operating parameters.
  • The Decline of Usability: Revisited | datagubbe.se

    Technology technology
    2
    0 Stimmen
    2 Beiträge
    13 Aufrufe
    2xsaiko@discuss.tchncs.de2
    Just saw this article linked in a ThePrimeagen video. I didn't watch the video, but I did read the article, and all of this article is exactly what I'm always saying when I'm complaining about current UI trends and why I'm so picky about the software I use and also the tools I use to write software. I shouldn't have to be picky, but it seems like developers (professional and hobbyist alike) don't care anymore and users don't have standards.
  • 51 Stimmen
    8 Beiträge
    36 Aufrufe
    B
    But do you also sometimes leave out AI for steps the AI often does for you, like the conceptualisation or the implementation? Would it be possible for you to do these steps as efficiently as before the use of AI? Would you be able to spot the mistakes the AI makes in these steps, even months or years along those lines? The main issue I have with AI being used in tasks is that it deprives you from using logic by applying it to real life scenarios, the thing we excel at. It would be better to use AI in the opposite direction you are currently use it as: develop methods to view the works critically. After all, if there is one thing a lot of people are bad at, it's thorough critical thinking. We just suck at knowing of all edge cases and how we test for them. Let the AI come up with unit tests, let it be the one that questions your work, in order to get a better perspective on it.
  • 0 Stimmen
    1 Beiträge
    9 Aufrufe
    Niemand hat geantwortet
  • Converting An E-Paper Photo Frame Into Weather Map

    Technology technology
    2
    1
    113 Stimmen
    2 Beiträge
    17 Aufrufe
    indibrony@lemmy.worldI
    Looks like East Anglia has basically disappeared. At least nothing of value was lost
  • Websites Are Tracking You Via Browser Fingerprinting

    Technology technology
    41
    1
    296 Stimmen
    41 Beiträge
    167 Aufrufe
    M
    Lets you question how digital stalking is still allowed?
  • 0 Stimmen
    1 Beiträge
    3 Aufrufe
    Niemand hat geantwortet
  • Microsoft pulls MS365 Business Premium from nonprofits

    Technology technology
    37
    1
    48 Stimmen
    37 Beiträge
    139 Aufrufe
    S
    That's the thing, I wish we could just switch all enterprises to Linux, but Microsoft developed a huge ecosystem that really does have good features. Unless something comparable comes up in the Linux world, I don't see Europe becoming independent of Microsoft any time soon