linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI agents wrong ~70% of time: Carnegie Mellon study

Technology

277 Beiträge 108 Kommentatoren 90 Aufrufe

D dreamlandlividity@lemmy.world

It is a lot harder to notice incorrect information in review, than making sure it is correct when writing it.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#119

harder to notice incorrect information in review, than making sure it is correct when writing it.

That depends entirely on your writing method and attention span for review.

Most people make stuff up off the cuff and skim anything longer than 75 words when reviewing, so the bar for AI improving over that is really low.
1 Antwort Letzte Antwort

1
L lepinkainen@lemmy.world

Wrong 70% doing what?

I’ve used LLMs as a Stack Overflow / MSDN replacement for over a year and if they fucked up 7/10 questions I’d stop.

Same with code, any free model can easily generate simple scripts and utilities with maybe 10% error rate, definitely not 70%
T This user is from outside of this forum
T This user is from outside of this forum
timeworntraveler@lemmy.dbzer0.com

schrieb zuletzt editiert von

#120

it specifies the tasks in the article
1 Antwort Letzte Antwort

0
N nalivai@discuss.tchncs.de

The person who uses fancy autocomplete to write their code will be exactly the person who thinks they're better than everyone. Those traits are correlated.
K This user is from outside of this forum
K This user is from outside of this forum
kameecoding@lemmy.world

schrieb zuletzt editiert von kameecoding@lemmy.world

#121

Do you use an IDE for writing your code or do you use a notepad like a "real" programmer?
An IDE like Intellij has fancy shit like generating getters, setters, constructors, equals hashscode, you should never use those, real programmers write those by hand.

Your attention detail is very good btw, which I am ofc being sarcastic about because if you had any you'd have noticed I have never said I write my code with chat gpt, I said Unit tests, sql for unit tests.

Ofc attention to detail is not a requirement of software engineering so you should be good. (This was also sarcasm I feel like you need this to be pointed out for you).

Also by your implied logic that the code being not written by you = bad, no company should ever hire Junior engineers, I mean what are you gonna do? Fucking read the code they wrote?
N 1 Antwort Letzte Antwort

1
D dylanmorgan@slrpnk.net

Claude why did you make me an appointment with a gynecologist? I need an appointment with my neurologist, I’m a man and I have Parkinson’s.
T This user is from outside of this forum
T This user is from outside of this forum
timeworntraveler@lemmy.dbzer0.com

schrieb zuletzt editiert von timeworntraveler@lemmy.dbzer0.com

#122

Got it, changing your gender to female. Is there anything else I can assist you with?
1 Antwort Letzte Antwort

0
J jsomae@lemmy.ml

Right, so this is really only useful in cases where either it's vastly easier to verify an answer than posit one, or if a conventional program can verify the result of the AI's output.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#123

It’s usually vastly easier to verify an answer than posit one, if you have the patience to do so.

I'm envisioning a world where multiple AI engines create and check each others' work... the first thing they need to make work to support that scenario is probably fusion power.
Z 1 Antwort Letzte Antwort

2
O outhouseperilous@lemmy.dbzer0.com

You get how that's fucking useless, generally?
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#124

As useless as a cubicle farm full of unsupervised workers.
O 1 Antwort Letzte Antwort

1
K knock_knock_lemmy_in@lemmy.world

Run something with a 70% failure rate 10x and you get to a cumulative 98% pass rate.
LLMs don't get tired and they can be run in parallel.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#125

I have actually been doing this lately: iteratively prompting AI to write software and fix its errors until something useful comes out. It's a lot like machine translation. I speak fluent C++, but I don't speak Rust, but I can hammer away on the AI (with English language prompts) until it produces passable Rust for something I could write for myself in C++ in half the time and effort.

I also don't speak Finnish, but Google Translate can take what I say in English and put it into at least somewhat comprehensible Finnish without egregious translation errors most of the time.

Is this useful? When C++ is getting banned for "security concerns" and Rust is the required language, it's at least a little helpful.
J 1 Antwort Letzte Antwort

2
M mangocats@feddit.it

As useless as a cubicle farm full of unsupervised workers.
O This user is from outside of this forum
O This user is from outside of this forum
outhouseperilous@lemmy.dbzer0.com

schrieb zuletzt editiert von

#126

Tjose are people who could be living their li:es, pursuing their ambitions, whatever. That could get some shit done. Comparison not valid.
H 1 Antwort Letzte Antwort

4
M mangocats@feddit.it

It’s usually vastly easier to verify an answer than posit one, if you have the patience to do so.

I'm envisioning a world where multiple AI engines create and check each others' work... the first thing they need to make work to support that scenario is probably fusion power.
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von

#127

It’s usually vastly easier to verify an answer than posit one, if you have the patience to do so.

I usually write 3x the code to test the code itself. Verification is often harder than implementation.
M J 2 Antworten Letzte Antwort

2
P punkwalrus@lemmy.world

I'd compare LLMs to a junior executive. Probably gets the basic stuff right, but check and verify for anything important or complicated. Break tasks down into easier steps.
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von zbyte64@awful.systems

#128

A junior developer actually learns from doing the job, an LLM only learns when they update the training corpus and develop an updated model.
J 1 Antwort Letzte Antwort

3
Z zbyte64@awful.systems

It’s usually vastly easier to verify an answer than posit one, if you have the patience to do so.

I usually write 3x the code to test the code itself. Verification is often harder than implementation.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#129

Yes, but the test code "writes itself" - the path is clear, you just have to fill in the blanks.

Writing the proper product code in the first place, that's the valuable challenge.
Z 1 Antwort Letzte Antwort

0
E eli001@lemmy.world

This post did not contain any content.
K This user is from outside of this forum
K This user is from outside of this forum
katana314@lemmy.world

schrieb zuletzt editiert von

#130

I'm in a workplace that has tried not to be overbearing about AI, but has encouraged us to use them for coding.

I've tried to give mine some very simple tasks like writing a unit test just for the constructor of a class to verify current behavior, and it generates output that's both wrong and doesn't verify anything.

I'm aware it sometimes gets better with more intricate, specific instructions, and that I can offer it further corrections, but at that point it's not even saving time. I would do this with a human in the hopes that they would continue to retain the knowledge, but I don't even have hopes for AI to apply those lessons in new contexts. In a way, it's been a sigh of relief to realize just like Dotcom, just like 3D TVs, just like home smart assistants, it is a bubble.
M R J 3 Antworten Letzte Antwort

34
S shayeta@feddit.org

How do I set up event driven document ingestion from OneDrive located on an Azure tenant to Amazon DocumentDB? Ingestion must be near-realtime, durable, and have some form of DLQ.
T This user is from outside of this forum
T This user is from outside of this forum
tja@programming.dev

schrieb zuletzt editiert von

#131

DocumentDB is not for one drive documents (PDFs and such). It's for "documents" as in serialized objects (json or bson).
S 1 Antwort Letzte Antwort

2
K kameecoding@lemmy.world

Do you use an IDE for writing your code or do you use a notepad like a "real" programmer?
An IDE like Intellij has fancy shit like generating getters, setters, constructors, equals hashscode, you should never use those, real programmers write those by hand.

Your attention detail is very good btw, which I am ofc being sarcastic about because if you had any you'd have noticed I have never said I write my code with chat gpt, I said Unit tests, sql for unit tests.

Ofc attention to detail is not a requirement of software engineering so you should be good. (This was also sarcasm I feel like you need this to be pointed out for you).

Also by your implied logic that the code being not written by you = bad, no company should ever hire Junior engineers, I mean what are you gonna do? Fucking read the code they wrote?
N This user is from outside of this forum
N This user is from outside of this forum
nalivai@discuss.tchncs.de

schrieb zuletzt editiert von

#132

Were you prone to this weird leaps of logic before your brain was fried by talking to LLMs, or did you start being a fan of talking to LLMs because your ability to logic was...well...that?
K 1 Antwort Letzte Antwort

0
T timeworntraveler@lemmy.dbzer0.com

AI cant even understand it's own brain to write about it
T This user is from outside of this forum
T This user is from outside of this forum
tja@programming.dev

schrieb zuletzt editiert von

#133

Neither can we...
T 1 Antwort Letzte Antwort

0
S suburban_hillbilly@lemmy.ml

Gell-Mann amnesia effect - Wikipedia

(en.m.wikipedia.org)
M This user is from outside of this forum
M This user is from outside of this forum
melvin_ferd@lemmy.world

schrieb zuletzt editiert von

#134

Whoa that's like how many colors there are
1 Antwort Letzte Antwort

3
S some_guy@lemmy.sdf.org

Yeah, they’re statistical word generators. There’s no intelligence. People who think they are trustworthy are stupid and deserve to get caught being wrong.
A This user is from outside of this forum
A This user is from outside of this forum
alteredego@lemmy.ml

schrieb zuletzt editiert von

#135

Emotion > Facts. Most people have been trained to blindly accept things and cheer on what fits with their agenda. Like technbro's exaggerating LLMs, or people like you misrepresenting LLMs as mere statistical word generators without intelligence. That's like saying a computer is just wires and switches, or missing the forest for the trees. Both is equally false.

Yet if it fits with the emotional needs or with dogma, then other will agree. It's a convenient and comforting "A vs B" worldview we've been trained to accept. And so the satisfying notion and misinformation keeps spreading.

LLMs tell us more about human intelligence and the human slop we've been generating. It tells us that most people are not that much more than statistical word generators.
S S 2 Antworten Letzte Antwort

2
T timeworntraveler@lemmy.dbzer0.com

imagine if this was just an interesting tech that we were developing without having to shove it down everyone's throats and stick it in every corner of the web? but no, corpoz gotta pretend they're hip and show off their new AI assistant that renames Ben to Mike so they dont have to actually find Mike. capitalism ruins everything.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#136

There's a certain amount of: "if this isn't going to take over the world, I'm going to just take my money and put it in something that will" mentality out there. It's not 100% of all investors, but it's pervasive enough that the "potential world beaters" are seriously over-funded as compared to their more modest reliable inflation+10% YoY return alternatives.
1 Antwort Letzte Antwort

5
O outhouseperilous@lemmy.dbzer0.com

Tjose are people who could be living their li:es, pursuing their ambitions, whatever. That could get some shit done. Comparison not valid.
H This user is from outside of this forum
H This user is from outside of this forum
honytawk@feddit.nl

schrieb zuletzt editiert von

#137

The comparison is about the correctness of their work.

Their lives have nothing to do with it.
D O 2 Antworten Letzte Antwort

1
T timeworntraveler@lemmy.dbzer0.com

that is such a ridiculous idea. Just because you see hate for it in the media doesn't mean it originated there. I'll have you know that i have embarrassed myself by screaming at robot phone receptionists for years now. stupid fuckers pretending to be people but not knowing shit. I was born ready to hate LLMs and I'm not gonna have you claim that CNN made me do it.
M This user is from outside of this forum
M This user is from outside of this forum
melvin_ferd@lemmy.world

schrieb zuletzt editiert von

#138

Search AI in Lemmy and check out every article on it. It definitely is media spreading all the hate. And like this article is often some money yellow journalism
D T 2 Antworten Letzte Antwort

0

Anmelden zum Antworten

S

Spotify X Mod APK
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

2

1 Stimmen

1 Beiträge

0 Aufrufe

Niemand hat geantwortet
2

The Decline of Usability: Revisited | datagubbe.se
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
8

67 Stimmen

8 Beiträge

36 Aufrufe

R

I blame the idea of the 00s and 10s that there should be some "Zen" in computer UIs and that "Zen" is doing things wrong with the arrogant tone of "you don't understand it". Associated with Steve Jobs, but TBH Google as well. And also another idea of "you dummy talking about ergonomics can't be smarter than this big respectable corporation popping out stylish unusable bullshit". So - pretense of wisdom and taste, under which crowd fashion is masked, almost aggressive preference for authority over people actually having maybe some wisdom and taste due to being interested in that, blind trust into whatever tech authority you chose for yourself, because, if you remember, in the 00s it was still perceived as if all people working in anything connected to computers were as cool as aerospace engineers or naval engineers, some kind of elite, including those making user applications, objective flaw (or upside) of the old normal UIs - they are boring, that's why UIs in video games and in fashionable chat applications (like ICQ and Skype), not talking about video and audio players, were non-standard like always, I think the solution would be in per-application theming, not in breaking paradigms, again, like with ICQ and old Skype and video games, I prefer it when boredom is thought with different applications having different icons and colors, but the UI paradigm remains the same, I think there was a themed IE called LOTR browser which I used (ok, not really, I used Opera) to complement ICQ, QuickTime player and BitComet, all mentioned had standard paradigm and non-standard look.
P

Google’s Advanced Protection Arrives on Android: Should You Use It?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
17

1

61 Stimmen

17 Beiträge

64 Aufrufe

A

I’ll probably never trust anything they’ve touched until I’ve taken it apart and put it back together again. Me too. But the vast majority of users need guardrails, and have a different threat model. Even those that also care about privacy, if they just want a solution that comes by default, this adtech 'fake' or 'superficial' solution does provide something. And anything is more than nothing.
K

AJWIN — A Revolução do Entretenimento Online em Suas Mãos
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

7 Aufrufe

Niemand hat geantwortet
A

FBI Wants Access To Encrypted iPhone And Android Data—So Does Europe
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
38

1

175 Stimmen

38 Beiträge

140 Aufrufe

W

It's not a back door, it's just a rear entryway
P

Community Notes vanishes from X feeds, raising 'serious questions' amid ongoing EU probe
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
94

1

462 Stimmen

94 Beiträge

270 Aufrufe

L

Make them publishers or whatever is required to have it be a legal requirement, have them ban people who share false information. The law doesn't magically make open discussions not open. By design, social media is open. If discussion from the public is closed, then it's no longer social media. ban people who share false information Banning people doesn't stop falsehoods. It's a broken solution promoting a false assurance. Authorities are still fallible & risk banning over unpopular/debatable expressions that may turn out true. There was unpopular dissent over covid lockdown policies in the US despite some dramatic differences with EU policies. Pro-palestinian protests get cracked down. Authorities are vulnerable to biases & swayed. Moreover, when people can just share their falsehoods offline, attempting to ban them online is hard to justify. If print media, through its decline, is being held legally responsible Print media is a controlled medium that controls it writers & approves everything before printing. It has a prepared, coordinated message. They can & do print books full of falsehoods if they want. Social media is open communication where anyone in the entire public can freely post anything before it is revoked. They aren't claiming to spread the truth, merely to enable communication.
R

OpenAI plans massive UAE data center project
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
4

1

0 Stimmen

4 Beiträge

23 Aufrufe

V

TD Cowen (which is basically the US arm of one of the largest Canadian investment banks) did an extensive report on the state of AI investment. What they found was that despite all their big claims about the future of AI, Microsoft were quietly allowing letters of intent for billions of dollars worth of new compute capacity to expire. Basically, scrapping future plans for expansion, but in a way that's not showy and doesn't require any kind of big announcement. The equivalent of promising to be at the party and then just not showing up. Not long after this reporting came out, it got confirmed by Microsoft, and not long after it came out that Amazon was doing the same thing. Ed Zitron has a really good write up on it; https://www.wheresyoured.at/power-cut/ Amazon isn't the big surprise, they've always been the most cautious of the big players on the whole AI thing. Microsoft on the other hand are very much trying to play things both ways. They know AI is fucked, which is why they're scaling back, but they've also invested a lot of money into their OpenAI partnership so now they have to justify that expenditure which means convincing investors that consumers absolutely love their AI products and are desparate for more. As always, follow the money. Stuff like the three mile island thing is mostly just applying for permits and so on at this point. Relatively small investments. As soon as it comes to big money hitting the table, they're pulling back. That's how you know how they really feel.
D

Chrome using Gemini Nano for ‘Enhanced Protection’ against scams
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
8

1

1 Stimmen

8 Beiträge

36 Aufrufe

L

I think the principle could be applied to scan outside of the machine. It is making requests to 127.0.0.1:{port} - effectively using your computer as a "server" in a sort of reverse-SSRF attack. There's no reason it can't make requests to 10.10.10.1:{port} as well. Of course you'd need to guess the netmask of the network address range first, but this isn't that hard. In fact, if you consider that at least as far as the desktop site goes, most people will be browsing the web behind a standard consumer router left on defaults where it will be the first device in the DHCP range (e.g. 192.168.0.1 or 10.10.10.1), which tends to have a web UI on the LAN interface (port 8080, 80 or 443), then you'd only realistically need to scan a few addresses to determine the network address range. If you want to keep noise even lower, using just 192.168.0.1:80 and 192.168.1.1:80 I'd wager would cover 99% of consumer routers. From there you could assume that it's a /24 netmask and scan IPs to your heart's content. You could do top 10 most common ports type scans and go in-depth on anything you get a result on. I haven't tested this, but I don't see why it wouldn't work, when I was testing 13ft.io - a self-hosted 12ft.io paywall remover, an SSRF flaw like this absolutely let you perform any network request to any LAN address in range.