linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI agents wrong ~70% of time: Carnegie Mellon study

Technology

272 Beiträge 107 Kommentatoren 79 Aufrufe

P punkwalrus@lemmy.world

I'd compare LLMs to a junior executive. Probably gets the basic stuff right, but check and verify for anything important or complicated. Break tasks down into easier steps.
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von zbyte64@awful.systems

#128

A junior developer actually learns from doing the job, an LLM only learns when they update the training corpus and develop an updated model.
J 1 Antwort Letzte Antwort

3
Z zbyte64@awful.systems

It’s usually vastly easier to verify an answer than posit one, if you have the patience to do so.

I usually write 3x the code to test the code itself. Verification is often harder than implementation.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#129

Yes, but the test code "writes itself" - the path is clear, you just have to fill in the blanks.

Writing the proper product code in the first place, that's the valuable challenge.
Z 1 Antwort Letzte Antwort

0
E eli001@lemmy.world

This post did not contain any content.
K This user is from outside of this forum
K This user is from outside of this forum
katana314@lemmy.world

schrieb zuletzt editiert von

#130

I'm in a workplace that has tried not to be overbearing about AI, but has encouraged us to use them for coding.

I've tried to give mine some very simple tasks like writing a unit test just for the constructor of a class to verify current behavior, and it generates output that's both wrong and doesn't verify anything.

I'm aware it sometimes gets better with more intricate, specific instructions, and that I can offer it further corrections, but at that point it's not even saving time. I would do this with a human in the hopes that they would continue to retain the knowledge, but I don't even have hopes for AI to apply those lessons in new contexts. In a way, it's been a sigh of relief to realize just like Dotcom, just like 3D TVs, just like home smart assistants, it is a bubble.
M R J 3 Antworten Letzte Antwort

34
S shayeta@feddit.org

How do I set up event driven document ingestion from OneDrive located on an Azure tenant to Amazon DocumentDB? Ingestion must be near-realtime, durable, and have some form of DLQ.
T This user is from outside of this forum
T This user is from outside of this forum
tja@programming.dev

schrieb zuletzt editiert von

#131

DocumentDB is not for one drive documents (PDFs and such). It's for "documents" as in serialized objects (json or bson).
S 1 Antwort Letzte Antwort

2
K kameecoding@lemmy.world

Do you use an IDE for writing your code or do you use a notepad like a "real" programmer?
An IDE like Intellij has fancy shit like generating getters, setters, constructors, equals hashscode, you should never use those, real programmers write those by hand.

Your attention detail is very good btw, which I am ofc being sarcastic about because if you had any you'd have noticed I have never said I write my code with chat gpt, I said Unit tests, sql for unit tests.

Ofc attention to detail is not a requirement of software engineering so you should be good. (This was also sarcasm I feel like you need this to be pointed out for you).

Also by your implied logic that the code being not written by you = bad, no company should ever hire Junior engineers, I mean what are you gonna do? Fucking read the code they wrote?
N This user is from outside of this forum
N This user is from outside of this forum
nalivai@discuss.tchncs.de

schrieb zuletzt editiert von

#132

Were you prone to this weird leaps of logic before your brain was fried by talking to LLMs, or did you start being a fan of talking to LLMs because your ability to logic was...well...that?
K 1 Antwort Letzte Antwort

0
T timeworntraveler@lemmy.dbzer0.com

AI cant even understand it's own brain to write about it
T This user is from outside of this forum
T This user is from outside of this forum
tja@programming.dev

schrieb zuletzt editiert von

#133

Neither can we...
T 1 Antwort Letzte Antwort

0
S suburban_hillbilly@lemmy.ml

Gell-Mann amnesia effect - Wikipedia

(en.m.wikipedia.org)
M This user is from outside of this forum
M This user is from outside of this forum
melvin_ferd@lemmy.world

schrieb zuletzt editiert von

#134

Whoa that's like how many colors there are
1 Antwort Letzte Antwort

3
S some_guy@lemmy.sdf.org

Yeah, they’re statistical word generators. There’s no intelligence. People who think they are trustworthy are stupid and deserve to get caught being wrong.
A This user is from outside of this forum
A This user is from outside of this forum
alteredego@lemmy.ml

schrieb zuletzt editiert von

#135

Emotion > Facts. Most people have been trained to blindly accept things and cheer on what fits with their agenda. Like technbro's exaggerating LLMs, or people like you misrepresenting LLMs as mere statistical word generators without intelligence. That's like saying a computer is just wires and switches, or missing the forest for the trees. Both is equally false.

Yet if it fits with the emotional needs or with dogma, then other will agree. It's a convenient and comforting "A vs B" worldview we've been trained to accept. And so the satisfying notion and misinformation keeps spreading.

LLMs tell us more about human intelligence and the human slop we've been generating. It tells us that most people are not that much more than statistical word generators.
S S 2 Antworten Letzte Antwort

2
T timeworntraveler@lemmy.dbzer0.com

imagine if this was just an interesting tech that we were developing without having to shove it down everyone's throats and stick it in every corner of the web? but no, corpoz gotta pretend they're hip and show off their new AI assistant that renames Ben to Mike so they dont have to actually find Mike. capitalism ruins everything.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#136

There's a certain amount of: "if this isn't going to take over the world, I'm going to just take my money and put it in something that will" mentality out there. It's not 100% of all investors, but it's pervasive enough that the "potential world beaters" are seriously over-funded as compared to their more modest reliable inflation+10% YoY return alternatives.
1 Antwort Letzte Antwort

5
O outhouseperilous@lemmy.dbzer0.com

Tjose are people who could be living their li:es, pursuing their ambitions, whatever. That could get some shit done. Comparison not valid.
H This user is from outside of this forum
H This user is from outside of this forum
honytawk@feddit.nl

schrieb zuletzt editiert von

#137

The comparison is about the correctness of their work.

Their lives have nothing to do with it.
D O 2 Antworten Letzte Antwort

1
T timeworntraveler@lemmy.dbzer0.com

that is such a ridiculous idea. Just because you see hate for it in the media doesn't mean it originated there. I'll have you know that i have embarrassed myself by screaming at robot phone receptionists for years now. stupid fuckers pretending to be people but not knowing shit. I was born ready to hate LLMs and I'm not gonna have you claim that CNN made me do it.
M This user is from outside of this forum
M This user is from outside of this forum
melvin_ferd@lemmy.world

schrieb zuletzt editiert von

#138

Search AI in Lemmy and check out every article on it. It definitely is media spreading all the hate. And like this article is often some money yellow journalism
D T 2 Antworten Letzte Antwort

0
K katana314@lemmy.world

I'm in a workplace that has tried not to be overbearing about AI, but has encouraged us to use them for coding.

I've tried to give mine some very simple tasks like writing a unit test just for the constructor of a class to verify current behavior, and it generates output that's both wrong and doesn't verify anything.

I'm aware it sometimes gets better with more intricate, specific instructions, and that I can offer it further corrections, but at that point it's not even saving time. I would do this with a human in the hopes that they would continue to retain the knowledge, but I don't even have hopes for AI to apply those lessons in new contexts. In a way, it's been a sigh of relief to realize just like Dotcom, just like 3D TVs, just like home smart assistants, it is a bubble.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#139

The first half dozen times I tried AI for code, across the past year or so, it failed pretty much as you describe.

Finally, I hit on some things it can do. For me: keeping the instructions more general, not specifying certain libraries for instance, was the key to getting something that actually does something. Also, if it doesn't show you the whole program, get it to show you the whole thing, and make it fix its own mistakes so you can build on working code with later requests.
V S 2 Antworten Letzte Antwort

4
O outhouseperilous@lemmy.dbzer0.com

No, it matters. Youre pushing the lie they want pushed.
H This user is from outside of this forum
H This user is from outside of this forum
honytawk@feddit.nl

schrieb zuletzt editiert von honytawk@feddit.nl

#140

And you're pushing a hate train with no aspect of nuance to show for it.

Seems like you are even less than 30% useful. And that is mainly because you can be used as fertilizer.
1 Antwort Letzte Antwort

0
M morto@piefed.social

and doesn't need to be exactly right

What kind of tasks do you consider that don't need to be exactly right?
H This user is from outside of this forum
H This user is from outside of this forum
honytawk@feddit.nl

schrieb zuletzt editiert von honytawk@feddit.nl

#141

Description generators for TTRPGs, as you will read through them afterwards anyway and correct when necessary.

Generating lists of ideas. For creative writing, getting a bunch of ideas you can pick and choose from that fit the narrative you want.

A search engine like Perplexity.ai which after searching summarizes the web page and adds a link to the page next to it. If the summary seems promising, you go to the real page to verify the actual information.

Simple code like HTML pages and boilerplate code that you will still review afterwards anyway.
1 Antwort Letzte Antwort

1
Z zbyte64@awful.systems

When LLMs get it right it's because they're summarizing a stack overflow or GitHub snippet it was trained on. But you loose all the benefits of other humans commenting on the context, pitfalls and other alternatives.
H This user is from outside of this forum
H This user is from outside of this forum
honytawk@feddit.nl

schrieb zuletzt editiert von

#142

You mean things you had to do anyway even if you didn't use LLMs?
1 Antwort Letzte Antwort

0
D dylanmorgan@slrpnk.net

That’s literally how “AI agents” are being marketed. “Tell it to do a thing and it will do it for you.”
H This user is from outside of this forum
H This user is from outside of this forum
honytawk@feddit.nl

schrieb zuletzt editiert von

#143

So? That doesn't mean they are supposed to be used like that.

Show me any marketing that isn't full of lies.
1 Antwort Letzte Antwort

0
M mangocats@feddit.it

The first half dozen times I tried AI for code, across the past year or so, it failed pretty much as you describe.

Finally, I hit on some things it can do. For me: keeping the instructions more general, not specifying certain libraries for instance, was the key to getting something that actually does something. Also, if it doesn't show you the whole program, get it to show you the whole thing, and make it fix its own mistakes so you can build on working code with later requests.
V This user is from outside of this forum
V This user is from outside of this forum
vivendi@programming.dev

schrieb zuletzt editiert von

#144

Have you tried insulting the AI in the system prompt (as well as other tunes to the system prompt)?

I'm not joking, it really works

For example:

Instead of "You are an intelligent coding assistant..."

"You are an absolute fucking idiot who can barely code..."
R M 2 Antworten Letzte Antwort

7
A alteredego@lemmy.ml

Emotion > Facts. Most people have been trained to blindly accept things and cheer on what fits with their agenda. Like technbro's exaggerating LLMs, or people like you misrepresenting LLMs as mere statistical word generators without intelligence. That's like saying a computer is just wires and switches, or missing the forest for the trees. Both is equally false.

Yet if it fits with the emotional needs or with dogma, then other will agree. It's a convenient and comforting "A vs B" worldview we've been trained to accept. And so the satisfying notion and misinformation keeps spreading.

LLMs tell us more about human intelligence and the human slop we've been generating. It tells us that most people are not that much more than statistical word generators.
S This user is from outside of this forum
S This user is from outside of this forum
some_guy@lemmy.sdf.org

schrieb zuletzt editiert von

#145

people like you misrepresenting LLMs as mere statistical word generators without intelligence.

You've bought-in to the hype. I won't try to argue with you because you aren't cognizent of reality.
A 1 Antwort Letzte Antwort

5
E eli001@lemmy.world

This post did not contain any content.
B This user is from outside of this forum
B This user is from outside of this forum
blackmist@feddit.uk

schrieb zuletzt editiert von

#146

We have created the overconfident intern in digital form.
J 1 Antwort Letzte Antwort

38
Z zbyte64@awful.systems

When LLMs get it right it's because they're summarizing a stack overflow or GitHub snippet it was trained on. But you loose all the benefits of other humans commenting on the context, pitfalls and other alternatives.
P This user is from outside of this forum
P This user is from outside of this forum
potentialproblem@sh.itjust.works

schrieb zuletzt editiert von

#147

You’re not wrong, but often I’m just trying to do something I’ve done a thousand times before and I already know the pitfalls. Also, I’m sure I’ve copied code from stackoverflow before.
1 Antwort Letzte Antwort

0

Anmelden zum Antworten

U

A Forensic Examination of GIS Arta
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

6 Stimmen

1 Beiträge

5 Aufrufe

Niemand hat geantwortet
F

Does using ChatGPT change your brain activity? Study sparks debate
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
8

1

51 Stimmen

8 Beiträge

36 Aufrufe

B

But do you also sometimes leave out AI for steps the AI often does for you, like the conceptualisation or the implementation? Would it be possible for you to do these steps as efficiently as before the use of AI? Would you be able to spot the mistakes the AI makes in these steps, even months or years along those lines? The main issue I have with AI being used in tasks is that it deprives you from using logic by applying it to real life scenarios, the thing we excel at. It would be better to use AI in the opposite direction you are currently use it as: develop methods to view the works critically. After all, if there is one thing a lot of people are bad at, it's thorough critical thinking. We just suck at knowing of all edge cases and how we test for them. Let the AI come up with unit tests, let it be the one that questions your work, in order to get a better perspective on it.
P

eSafety boss wants YouTube included in the social media ban. But AI raises even more concerns for kids
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
13

1

84 Stimmen

13 Beiträge

33 Aufrufe

M

It's a bit of a sticking point in Australia which is becoming more and more of a 'two-speed' society. Foxtel is for the rich classes, it caters to the right wing. Sky News is on Foxtel. These eSafety directives killing access to youtube won't affect those rich kids so much, but for everyone else it's going to be a nightmare. My only possible hope out of this is that maybe, Parliament and ACMA (Australian Communications and Media Authority, TV standards) decide that since we need a greater media landscape for kids and they can't be allowed to have it online, that maybe more than 3 major broadcasters could be allowed. It's not a lack of will that stops anyone else making a new free-to-air network, it's legislation, there are only allowed to be 3 commercial FTA broadcasters in any area. I don't love Youtube or the kids watching it, it's that the alternatives are almost objectively worse. 10 and 7 and garbage 24/7 and 9 is basically a right-wing hugbox too.
P

AI applications are producing cleaner cities, smarter homes and more efficient transit
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
14

1

3 Stimmen

14 Beiträge

57 Aufrufe

W

it would be interesting to hear your opinion, @Pro@programming.dev, why did you think you want to post this here
P

Signal – an ethical replacement for WhatsApp
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
235

1

1k Stimmen

235 Beiträge

932 Aufrufe

V

What I said is that smart people can be convinced to move to another platform. Most of my friends are not technically inclined, but it was easy to make them use it, at least to chat with me. What you did is change "smart people" with "people who already want to move", which is not the same. You then said it's not something you can choose (as you cannot choose to be rich). But I answered that you can actually choose your friends. Never did I say people who are not interested in niche technologies are not smart. My statement can be rephrased in an equivalent statement "people who cannot be convinced to change are not smart", and I stand to it.
A

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
37

1

311 Stimmen

37 Beiträge

61 Aufrufe

S

Same, especially when searching technical or niche topics. Since there aren't a ton of results specific to the topic, mostly semi-related results will appear in the first page or two of a regular (non-Gemini) Google search, just due to the higher popularity of those webpages compared to the relevant webpages. Even the relevant webpages will have lots of non-relevant or semi-relevant information surrounding the answer I'm looking for. I don't know enough about it to be sure, but Gemini is probably just scraping a handful of websites on the first page, and since most of those are only semi-related, the resulting summary is a classic example of garbage in, garbage out. I also think there's probably something in the code that looks for information that is shared across multiple sources and prioritizing that over something that's only on one particular page (possibly the sole result with the information you need). Then, it phrases the summary as a direct answer to your query, misrepresenting the actual information on the pages they scraped. At least Gemini gives sources, I guess. The thing that gets on my nerves the most is how often I see people quote the summary as proof of something without checking the sources. It was bad before the rollout of Gemini, but at least back then Google was mostly scraping text and presenting it with little modification, along with a direct link to the webpage. Now, it's an LLM generating text phrased as a direct answer to a question (that was also AI-generated from your search query) using AI-summarized data points scraped from multiple webpages. It's obfuscating the source material further, but I also can't help but feel like it exposes a little of the behind-the-scenes fuckery Google has been doing for years before Gemini. How it bastardizes your query by interpreting it into a question, and then prioritizes homogeneous results that agree on the "answer" to your "question". For years they've been doing this to a certain extent, they just didn't share how they interpreted your query.
A

The world could experience a year above 2°C of warming by 2029
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
17

1

200 Stimmen

17 Beiträge

73 Aufrufe

S

Thank you for the clarification.
D

Rebecca Shaw: I knew one day I’d have to watch powerful men burn the world down. But I didn't expect them to be such losers.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1

17 Stimmen

2 Beiträge

14 Aufrufe

J

This is why they are businessmen and not politicians or influencers