linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI agents wrong ~70% of time: Carnegie Mellon study

Technology

277 Beiträge 108 Kommentatoren 90 Aufrufe

E eli001@lemmy.world

This post did not contain any content.
T This user is from outside of this forum
T This user is from outside of this forum
timeworntraveler@lemmy.dbzer0.com

schrieb zuletzt editiert von timeworntraveler@lemmy.dbzer0.com

#113

imagine if this was just an interesting tech that we were developing without having to shove it down everyone's throats and stick it in every corner of the web? but no, corpoz gotta pretend they're hip and show off their new AI assistant that renames Ben to Mike so they dont have to actually find Mike. capitalism ruins everything.
M 1 Antwort Letzte Antwort

29
S shayeta@feddit.org

It doesn't matter if you need a human to review. AI has no way distinguishing between success and failure. Either way a human will have to review 100% of those tasks.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#114

I have been using AI to write (little, near trivial) programs. It's blindingly obvious that it could be feeding this code to a compiler and catching its mistakes before giving them to me, but it doesn't... yet.
W 1 Antwort Letzte Antwort

6
O outbound7404@lemmy.ml

A human can review something close to correct a lot better than starting the task from zero.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#115

In University I knew a lot of students who knew all the things but "just don't know where to start" - if I gave them a little direction about where to start, they could run it to the finish all on their own.
1 Antwort Letzte Antwort

3
S suburban_hillbilly@lemmy.ml

Gell-Mann amnesia effect - Wikipedia

(en.m.wikipedia.org)
T This user is from outside of this forum
T This user is from outside of this forum
timeworntraveler@lemmy.dbzer0.com

schrieb zuletzt editiert von

#116

AI cant even understand it's own brain to write about it
T 1 Antwort Letzte Antwort

0
D dreamlandlividity@lemmy.world

It is a lot harder to notice incorrect information in review, than making sure it is correct when writing it.
L This user is from outside of this forum
L This user is from outside of this forum
loonsun@sh.itjust.works

schrieb zuletzt editiert von

#117

Depends on the context, there is a lot of work in the scientific methods community trying to use NLP to augment traditionally fully human processes such as thematic analysis and systematic literature reviews and you can have protocols for validation there without 100% human review
1 Antwort Letzte Antwort

1
M melvin_ferd@lemmy.world

Are you guys sure. The media seems to be where a lot of LLM hate originates.
T This user is from outside of this forum
T This user is from outside of this forum
timeworntraveler@lemmy.dbzer0.com

schrieb zuletzt editiert von

#118

that is such a ridiculous idea. Just because you see hate for it in the media doesn't mean it originated there. I'll have you know that i have embarrassed myself by screaming at robot phone receptionists for years now. stupid fuckers pretending to be people but not knowing shit. I was born ready to hate LLMs and I'm not gonna have you claim that CNN made me do it.
M 1 Antwort Letzte Antwort

3
D dreamlandlividity@lemmy.world

It is a lot harder to notice incorrect information in review, than making sure it is correct when writing it.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#119

harder to notice incorrect information in review, than making sure it is correct when writing it.

That depends entirely on your writing method and attention span for review.

Most people make stuff up off the cuff and skim anything longer than 75 words when reviewing, so the bar for AI improving over that is really low.
1 Antwort Letzte Antwort

1
L lepinkainen@lemmy.world

Wrong 70% doing what?

I’ve used LLMs as a Stack Overflow / MSDN replacement for over a year and if they fucked up 7/10 questions I’d stop.

Same with code, any free model can easily generate simple scripts and utilities with maybe 10% error rate, definitely not 70%
T This user is from outside of this forum
T This user is from outside of this forum
timeworntraveler@lemmy.dbzer0.com

schrieb zuletzt editiert von

#120

it specifies the tasks in the article
1 Antwort Letzte Antwort

0
N nalivai@discuss.tchncs.de

The person who uses fancy autocomplete to write their code will be exactly the person who thinks they're better than everyone. Those traits are correlated.
K This user is from outside of this forum
K This user is from outside of this forum
kameecoding@lemmy.world

schrieb zuletzt editiert von kameecoding@lemmy.world

#121

Do you use an IDE for writing your code or do you use a notepad like a "real" programmer?
An IDE like Intellij has fancy shit like generating getters, setters, constructors, equals hashscode, you should never use those, real programmers write those by hand.

Your attention detail is very good btw, which I am ofc being sarcastic about because if you had any you'd have noticed I have never said I write my code with chat gpt, I said Unit tests, sql for unit tests.

Ofc attention to detail is not a requirement of software engineering so you should be good. (This was also sarcasm I feel like you need this to be pointed out for you).

Also by your implied logic that the code being not written by you = bad, no company should ever hire Junior engineers, I mean what are you gonna do? Fucking read the code they wrote?
N 1 Antwort Letzte Antwort

1
D dylanmorgan@slrpnk.net

Claude why did you make me an appointment with a gynecologist? I need an appointment with my neurologist, I’m a man and I have Parkinson’s.
T This user is from outside of this forum
T This user is from outside of this forum
timeworntraveler@lemmy.dbzer0.com

schrieb zuletzt editiert von timeworntraveler@lemmy.dbzer0.com

#122

Got it, changing your gender to female. Is there anything else I can assist you with?
1 Antwort Letzte Antwort

0
J jsomae@lemmy.ml

Right, so this is really only useful in cases where either it's vastly easier to verify an answer than posit one, or if a conventional program can verify the result of the AI's output.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#123

It’s usually vastly easier to verify an answer than posit one, if you have the patience to do so.

I'm envisioning a world where multiple AI engines create and check each others' work... the first thing they need to make work to support that scenario is probably fusion power.
Z 1 Antwort Letzte Antwort

2
O outhouseperilous@lemmy.dbzer0.com

You get how that's fucking useless, generally?
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#124

As useless as a cubicle farm full of unsupervised workers.
O 1 Antwort Letzte Antwort

1
K knock_knock_lemmy_in@lemmy.world

Run something with a 70% failure rate 10x and you get to a cumulative 98% pass rate.
LLMs don't get tired and they can be run in parallel.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#125

I have actually been doing this lately: iteratively prompting AI to write software and fix its errors until something useful comes out. It's a lot like machine translation. I speak fluent C++, but I don't speak Rust, but I can hammer away on the AI (with English language prompts) until it produces passable Rust for something I could write for myself in C++ in half the time and effort.

I also don't speak Finnish, but Google Translate can take what I say in English and put it into at least somewhat comprehensible Finnish without egregious translation errors most of the time.

Is this useful? When C++ is getting banned for "security concerns" and Rust is the required language, it's at least a little helpful.
J 1 Antwort Letzte Antwort

2
M mangocats@feddit.it

As useless as a cubicle farm full of unsupervised workers.
O This user is from outside of this forum
O This user is from outside of this forum
outhouseperilous@lemmy.dbzer0.com

schrieb zuletzt editiert von

#126

Tjose are people who could be living their li:es, pursuing their ambitions, whatever. That could get some shit done. Comparison not valid.
H 1 Antwort Letzte Antwort

4
M mangocats@feddit.it

It’s usually vastly easier to verify an answer than posit one, if you have the patience to do so.

I'm envisioning a world where multiple AI engines create and check each others' work... the first thing they need to make work to support that scenario is probably fusion power.
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von

#127

It’s usually vastly easier to verify an answer than posit one, if you have the patience to do so.

I usually write 3x the code to test the code itself. Verification is often harder than implementation.
M J 2 Antworten Letzte Antwort

2
P punkwalrus@lemmy.world

I'd compare LLMs to a junior executive. Probably gets the basic stuff right, but check and verify for anything important or complicated. Break tasks down into easier steps.
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von zbyte64@awful.systems

#128

A junior developer actually learns from doing the job, an LLM only learns when they update the training corpus and develop an updated model.
J 1 Antwort Letzte Antwort

3
Z zbyte64@awful.systems

It’s usually vastly easier to verify an answer than posit one, if you have the patience to do so.

I usually write 3x the code to test the code itself. Verification is often harder than implementation.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#129

Yes, but the test code "writes itself" - the path is clear, you just have to fill in the blanks.

Writing the proper product code in the first place, that's the valuable challenge.
Z 1 Antwort Letzte Antwort

0
E eli001@lemmy.world

This post did not contain any content.
K This user is from outside of this forum
K This user is from outside of this forum
katana314@lemmy.world

schrieb zuletzt editiert von

#130

I'm in a workplace that has tried not to be overbearing about AI, but has encouraged us to use them for coding.

I've tried to give mine some very simple tasks like writing a unit test just for the constructor of a class to verify current behavior, and it generates output that's both wrong and doesn't verify anything.

I'm aware it sometimes gets better with more intricate, specific instructions, and that I can offer it further corrections, but at that point it's not even saving time. I would do this with a human in the hopes that they would continue to retain the knowledge, but I don't even have hopes for AI to apply those lessons in new contexts. In a way, it's been a sigh of relief to realize just like Dotcom, just like 3D TVs, just like home smart assistants, it is a bubble.
M R J 3 Antworten Letzte Antwort

34
S shayeta@feddit.org

How do I set up event driven document ingestion from OneDrive located on an Azure tenant to Amazon DocumentDB? Ingestion must be near-realtime, durable, and have some form of DLQ.
T This user is from outside of this forum
T This user is from outside of this forum
tja@programming.dev

schrieb zuletzt editiert von

#131

DocumentDB is not for one drive documents (PDFs and such). It's for "documents" as in serialized objects (json or bson).
S 1 Antwort Letzte Antwort

2
K kameecoding@lemmy.world

Do you use an IDE for writing your code or do you use a notepad like a "real" programmer?
An IDE like Intellij has fancy shit like generating getters, setters, constructors, equals hashscode, you should never use those, real programmers write those by hand.

Your attention detail is very good btw, which I am ofc being sarcastic about because if you had any you'd have noticed I have never said I write my code with chat gpt, I said Unit tests, sql for unit tests.

Ofc attention to detail is not a requirement of software engineering so you should be good. (This was also sarcasm I feel like you need this to be pointed out for you).

Also by your implied logic that the code being not written by you = bad, no company should ever hire Junior engineers, I mean what are you gonna do? Fucking read the code they wrote?
N This user is from outside of this forum
N This user is from outside of this forum
nalivai@discuss.tchncs.de

schrieb zuletzt editiert von

#132

Were you prone to this weird leaps of logic before your brain was fried by talking to LLMs, or did you start being a fan of talking to LLMs because your ability to logic was...well...that?
K 1 Antwort Letzte Antwort

0

Anmelden zum Antworten

P

(LLM) A language model built for the public good
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
11

1

108 Stimmen

11 Beiträge

0 Aufrufe

U

to be fair, the LLMs they use for chatbots and stolen pics generator are not AI either.
E

Financial 'stretch' for UK to join Europe's Starlink rival
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

29 Stimmen

1 Beiträge

4 Aufrufe

Niemand hat geantwortet
W

Cloudflare to AI Crawlers: Pay or be blocked
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
15

1

179 Stimmen

15 Beiträge

58 Aufrufe

F

Make a dummy Google Account, and log into it when on the VPN. Having an ad history avoids the blocks usually. (Note: only do this if your browsing is not activist related/etc) Also, if it's image captchas that never end, switch to the accessibility option for the captcha.
A

NOLA city council surprise discussion of facial recognition tech scheduled for this morning (June 30th) at 10 am
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

1

83 Stimmen

3 Beiträge

10 Aufrufe

I

Facial recognition hates jugalos and adversarial clothing patterns
M

Microsoft Pivots, Offers Free Windows 10 Updates after End-Of-Life Deadline with a Strategic Catch - WinBuzzer
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

7 Aufrufe

Niemand hat geantwortet
W

Is Matrix cooked?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
54

100 Stimmen

54 Beiträge

154 Aufrufe

W

Didn't know it only applied to UWP apps on Windows. That does seem like a pretty big problem then. it is mostly for compatibility reasons. no win32 programs are equipped to handle such granular permissions and sandboxing, they are all made with the assumption that they have access to whatever they need (other than other users' resources and things that require elevation). if Microsoft would have made that limitation to every kind of software, that Windows version would have probably been a failure in popularity because lots of software would have broken. I think S editions of windows is how they tried to go in that direction, with a more drastic way of simply just dropping support for 3rd party win32 programs. I don't still have a Mac readily available to test with but afaik it is any application that uses Apple's packaging format. ok, so if you run linux or windows utils in a compatibility layer, they still have less of a limited access? by which I mean graphical utilities. just tried with firefox, for macos it wanted to give me an .iso file (???) if so, it seems apple is doing roughly the same as microsoft with uwp and the appx format, and linux with flatpak: it's a choice for the user
F

Microsoft rolls Windows Recall out to the public nearly a year after announcing it
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

1

0 Stimmen

7 Beiträge

33 Aufrufe

C

Domain or azure ad join is what I'm used to, but for personal machines and friends/family I do local accounts.
P

Bots increase online user engagement but stifle meaningful discussion, study shows
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

1

0 Stimmen

3 Beiträge

20 Aufrufe

T

The platform owners don’t consider engagement to me be participation in meaningful discourse. Engagement to them just means staying on the platform while seeing ads. If bots keep people doing that those platforms will keep letting them in.