linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI agents wrong ~70% of time: Carnegie Mellon study

Technology

272 Beiträge 107 Kommentatoren 79 Aufrufe

D dreamlandlividity@lemmy.world

It is a lot harder to notice incorrect information in review, than making sure it is correct when writing it.
L This user is from outside of this forum
L This user is from outside of this forum
loonsun@sh.itjust.works

schrieb zuletzt editiert von

#117

Depends on the context, there is a lot of work in the scientific methods community trying to use NLP to augment traditionally fully human processes such as thematic analysis and systematic literature reviews and you can have protocols for validation there without 100% human review
1 Antwort Letzte Antwort

1
M melvin_ferd@lemmy.world

Are you guys sure. The media seems to be where a lot of LLM hate originates.
T This user is from outside of this forum
T This user is from outside of this forum
timeworntraveler@lemmy.dbzer0.com

schrieb zuletzt editiert von

#118

that is such a ridiculous idea. Just because you see hate for it in the media doesn't mean it originated there. I'll have you know that i have embarrassed myself by screaming at robot phone receptionists for years now. stupid fuckers pretending to be people but not knowing shit. I was born ready to hate LLMs and I'm not gonna have you claim that CNN made me do it.
M 1 Antwort Letzte Antwort

3
D dreamlandlividity@lemmy.world

It is a lot harder to notice incorrect information in review, than making sure it is correct when writing it.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#119

harder to notice incorrect information in review, than making sure it is correct when writing it.

That depends entirely on your writing method and attention span for review.

Most people make stuff up off the cuff and skim anything longer than 75 words when reviewing, so the bar for AI improving over that is really low.
1 Antwort Letzte Antwort

1
L lepinkainen@lemmy.world

Wrong 70% doing what?

I’ve used LLMs as a Stack Overflow / MSDN replacement for over a year and if they fucked up 7/10 questions I’d stop.

Same with code, any free model can easily generate simple scripts and utilities with maybe 10% error rate, definitely not 70%
T This user is from outside of this forum
T This user is from outside of this forum
timeworntraveler@lemmy.dbzer0.com

schrieb zuletzt editiert von

#120

it specifies the tasks in the article
1 Antwort Letzte Antwort

0
N nalivai@discuss.tchncs.de

The person who uses fancy autocomplete to write their code will be exactly the person who thinks they're better than everyone. Those traits are correlated.
K This user is from outside of this forum
K This user is from outside of this forum
kameecoding@lemmy.world

schrieb zuletzt editiert von kameecoding@lemmy.world

#121

Do you use an IDE for writing your code or do you use a notepad like a "real" programmer?
An IDE like Intellij has fancy shit like generating getters, setters, constructors, equals hashscode, you should never use those, real programmers write those by hand.

Your attention detail is very good btw, which I am ofc being sarcastic about because if you had any you'd have noticed I have never said I write my code with chat gpt, I said Unit tests, sql for unit tests.

Ofc attention to detail is not a requirement of software engineering so you should be good. (This was also sarcasm I feel like you need this to be pointed out for you).

Also by your implied logic that the code being not written by you = bad, no company should ever hire Junior engineers, I mean what are you gonna do? Fucking read the code they wrote?
N 1 Antwort Letzte Antwort

1
D dylanmorgan@slrpnk.net

Claude why did you make me an appointment with a gynecologist? I need an appointment with my neurologist, I’m a man and I have Parkinson’s.
T This user is from outside of this forum
T This user is from outside of this forum
timeworntraveler@lemmy.dbzer0.com

schrieb zuletzt editiert von timeworntraveler@lemmy.dbzer0.com

#122

Got it, changing your gender to female. Is there anything else I can assist you with?
1 Antwort Letzte Antwort

0
J jsomae@lemmy.ml

Right, so this is really only useful in cases where either it's vastly easier to verify an answer than posit one, or if a conventional program can verify the result of the AI's output.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#123

It’s usually vastly easier to verify an answer than posit one, if you have the patience to do so.

I'm envisioning a world where multiple AI engines create and check each others' work... the first thing they need to make work to support that scenario is probably fusion power.
Z 1 Antwort Letzte Antwort

2
O outhouseperilous@lemmy.dbzer0.com

You get how that's fucking useless, generally?
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#124

As useless as a cubicle farm full of unsupervised workers.
O 1 Antwort Letzte Antwort

1
K knock_knock_lemmy_in@lemmy.world

Run something with a 70% failure rate 10x and you get to a cumulative 98% pass rate.
LLMs don't get tired and they can be run in parallel.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#125

I have actually been doing this lately: iteratively prompting AI to write software and fix its errors until something useful comes out. It's a lot like machine translation. I speak fluent C++, but I don't speak Rust, but I can hammer away on the AI (with English language prompts) until it produces passable Rust for something I could write for myself in C++ in half the time and effort.

I also don't speak Finnish, but Google Translate can take what I say in English and put it into at least somewhat comprehensible Finnish without egregious translation errors most of the time.

Is this useful? When C++ is getting banned for "security concerns" and Rust is the required language, it's at least a little helpful.
J 1 Antwort Letzte Antwort

2
M mangocats@feddit.it

As useless as a cubicle farm full of unsupervised workers.
O This user is from outside of this forum
O This user is from outside of this forum
outhouseperilous@lemmy.dbzer0.com

schrieb zuletzt editiert von

#126

Tjose are people who could be living their li:es, pursuing their ambitions, whatever. That could get some shit done. Comparison not valid.
H 1 Antwort Letzte Antwort

4
M mangocats@feddit.it

It’s usually vastly easier to verify an answer than posit one, if you have the patience to do so.

I'm envisioning a world where multiple AI engines create and check each others' work... the first thing they need to make work to support that scenario is probably fusion power.
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von

#127

It’s usually vastly easier to verify an answer than posit one, if you have the patience to do so.

I usually write 3x the code to test the code itself. Verification is often harder than implementation.
M J 2 Antworten Letzte Antwort

2
P punkwalrus@lemmy.world

I'd compare LLMs to a junior executive. Probably gets the basic stuff right, but check and verify for anything important or complicated. Break tasks down into easier steps.
Z This user is from outside of this forum
Z This user is from outside of this forum
zbyte64@awful.systems

schrieb zuletzt editiert von zbyte64@awful.systems

#128

A junior developer actually learns from doing the job, an LLM only learns when they update the training corpus and develop an updated model.
J 1 Antwort Letzte Antwort

3
Z zbyte64@awful.systems

It’s usually vastly easier to verify an answer than posit one, if you have the patience to do so.

I usually write 3x the code to test the code itself. Verification is often harder than implementation.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#129

Yes, but the test code "writes itself" - the path is clear, you just have to fill in the blanks.

Writing the proper product code in the first place, that's the valuable challenge.
Z 1 Antwort Letzte Antwort

0
E eli001@lemmy.world

This post did not contain any content.
K This user is from outside of this forum
K This user is from outside of this forum
katana314@lemmy.world

schrieb zuletzt editiert von

#130

I'm in a workplace that has tried not to be overbearing about AI, but has encouraged us to use them for coding.

I've tried to give mine some very simple tasks like writing a unit test just for the constructor of a class to verify current behavior, and it generates output that's both wrong and doesn't verify anything.

I'm aware it sometimes gets better with more intricate, specific instructions, and that I can offer it further corrections, but at that point it's not even saving time. I would do this with a human in the hopes that they would continue to retain the knowledge, but I don't even have hopes for AI to apply those lessons in new contexts. In a way, it's been a sigh of relief to realize just like Dotcom, just like 3D TVs, just like home smart assistants, it is a bubble.
M R J 3 Antworten Letzte Antwort

34
S shayeta@feddit.org

How do I set up event driven document ingestion from OneDrive located on an Azure tenant to Amazon DocumentDB? Ingestion must be near-realtime, durable, and have some form of DLQ.
T This user is from outside of this forum
T This user is from outside of this forum
tja@programming.dev

schrieb zuletzt editiert von

#131

DocumentDB is not for one drive documents (PDFs and such). It's for "documents" as in serialized objects (json or bson).
S 1 Antwort Letzte Antwort

2
K kameecoding@lemmy.world

Do you use an IDE for writing your code or do you use a notepad like a "real" programmer?
An IDE like Intellij has fancy shit like generating getters, setters, constructors, equals hashscode, you should never use those, real programmers write those by hand.

Your attention detail is very good btw, which I am ofc being sarcastic about because if you had any you'd have noticed I have never said I write my code with chat gpt, I said Unit tests, sql for unit tests.

Ofc attention to detail is not a requirement of software engineering so you should be good. (This was also sarcasm I feel like you need this to be pointed out for you).

Also by your implied logic that the code being not written by you = bad, no company should ever hire Junior engineers, I mean what are you gonna do? Fucking read the code they wrote?
N This user is from outside of this forum
N This user is from outside of this forum
nalivai@discuss.tchncs.de

schrieb zuletzt editiert von

#132

Were you prone to this weird leaps of logic before your brain was fried by talking to LLMs, or did you start being a fan of talking to LLMs because your ability to logic was...well...that?
K 1 Antwort Letzte Antwort

0
T timeworntraveler@lemmy.dbzer0.com

AI cant even understand it's own brain to write about it
T This user is from outside of this forum
T This user is from outside of this forum
tja@programming.dev

schrieb zuletzt editiert von

#133

Neither can we...
T 1 Antwort Letzte Antwort

0
S suburban_hillbilly@lemmy.ml

Gell-Mann amnesia effect - Wikipedia

(en.m.wikipedia.org)
M This user is from outside of this forum
M This user is from outside of this forum
melvin_ferd@lemmy.world

schrieb zuletzt editiert von

#134

Whoa that's like how many colors there are
1 Antwort Letzte Antwort

3
S some_guy@lemmy.sdf.org

Yeah, they’re statistical word generators. There’s no intelligence. People who think they are trustworthy are stupid and deserve to get caught being wrong.
A This user is from outside of this forum
A This user is from outside of this forum
alteredego@lemmy.ml

schrieb zuletzt editiert von

#135

Emotion > Facts. Most people have been trained to blindly accept things and cheer on what fits with their agenda. Like technbro's exaggerating LLMs, or people like you misrepresenting LLMs as mere statistical word generators without intelligence. That's like saying a computer is just wires and switches, or missing the forest for the trees. Both is equally false.

Yet if it fits with the emotional needs or with dogma, then other will agree. It's a convenient and comforting "A vs B" worldview we've been trained to accept. And so the satisfying notion and misinformation keeps spreading.

LLMs tell us more about human intelligence and the human slop we've been generating. It tells us that most people are not that much more than statistical word generators.
S S 2 Antworten Letzte Antwort

2
T timeworntraveler@lemmy.dbzer0.com

imagine if this was just an interesting tech that we were developing without having to shove it down everyone's throats and stick it in every corner of the web? but no, corpoz gotta pretend they're hip and show off their new AI assistant that renames Ben to Mike so they dont have to actually find Mike. capitalism ruins everything.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#136

There's a certain amount of: "if this isn't going to take over the world, I'm going to just take my money and put it in something that will" mentality out there. It's not 100% of all investors, but it's pervasive enough that the "potential world beaters" are seriously over-funded as compared to their more modest reliable inflation+10% YoY return alternatives.
1 Antwort Letzte Antwort

5

Anmelden zum Antworten

S

Google, Microsoft and Amazon face pressure over data sovereignty - Rest of World
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1

35 Stimmen

2 Beiträge

0 Aufrufe

W

In April, Nigeria asked Google, Microsoft, and Amazon to set concrete deadlines for opening data centers in the country. Nigeria has been making this demand for about four years, but the companies have so far failed to fulfill their promises. Now, Nigeria has set up a working group with the companies to ensure that data is stored within its shores. Just onshoring the data center does not solve the problems. You can't be sure no data travels to the US servers, some data does need to travel to the US servers, and the entire DC is still subject to US software and certificate keychains. It's better, but not good or safe. I need to channel my inner Mike Ehrmantrout to the US tech companies and government: you had a good thing going you stupid son of a bitch. You had everything you needed and it all ran like clockwork. You could have shut your mouth, cooked, and made as much money as you needed, but you just had to blow it up, you and your pride and your ego. Seriously, this is a massive own goal by the US government. This is a massive loss to US hegemony and influence around the world that's never coming back. It has never been easier to build sovereign clouds with off the shelf and open source tooling. The best practices are largely documented, software is commoditized, and there are plenty of qualified people out there these days and governments staring down the barrel of existential risk have finally got the incentive to fund these efforts.
P

Next-Gen Brain Implants Offer New Hope for Depression: AI and real-time neural feedback could transform treatments
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
43

1

74 Stimmen

43 Beiträge

33 Aufrufe

O

The point is not visuals, though I know what you mean. The point is to gain the introspection and Brain chemistry changes. Micro dosing less than . 5 grams daily for short periods NOT LONGTERM, are very effective control vs SSRIs. Large mega doses are where the real changes happen. I highly recommend significant research and carrful planning if you choose this route. Safety. Trip sitters. Be safe. There has been major changes in PTSD war veterans and all sorts if mental health issues.
E

Build Custom WordPress Themes Easily with WP 1-Click
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

2

0 Stimmen

1 Beiträge

10 Aufrufe

Niemand hat geantwortet
A

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
37

1

311 Stimmen

37 Beiträge

61 Aufrufe

S

Same, especially when searching technical or niche topics. Since there aren't a ton of results specific to the topic, mostly semi-related results will appear in the first page or two of a regular (non-Gemini) Google search, just due to the higher popularity of those webpages compared to the relevant webpages. Even the relevant webpages will have lots of non-relevant or semi-relevant information surrounding the answer I'm looking for. I don't know enough about it to be sure, but Gemini is probably just scraping a handful of websites on the first page, and since most of those are only semi-related, the resulting summary is a classic example of garbage in, garbage out. I also think there's probably something in the code that looks for information that is shared across multiple sources and prioritizing that over something that's only on one particular page (possibly the sole result with the information you need). Then, it phrases the summary as a direct answer to your query, misrepresenting the actual information on the pages they scraped. At least Gemini gives sources, I guess. The thing that gets on my nerves the most is how often I see people quote the summary as proof of something without checking the sources. It was bad before the rollout of Gemini, but at least back then Google was mostly scraping text and presenting it with little modification, along with a direct link to the webpage. Now, it's an LLM generating text phrased as a direct answer to a question (that was also AI-generated from your search query) using AI-summarized data points scraped from multiple webpages. It's obfuscating the source material further, but I also can't help but feel like it exposes a little of the behind-the-scenes fuckery Google has been doing for years before Gemini. How it bastardizes your query by interpreting it into a question, and then prioritizes homogeneous results that agree on the "answer" to your "question". For years they've been doing this to a certain extent, they just didn't share how they interpreted your query.
T

The Case for Software Craftsmanship in the Era of Vibes — Zed's Blog
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
11

1

61 Stimmen

11 Beiträge

43 Aufrufe

K

If you use LLMs like they should be, i.e. as autocomplete, they're helpful. Classic autocomplete can't see me type "import" and correctly guess that I want to import a file that I just created, but Copilot can. You shouldn't expect it to understand code, but it can type more quickly than you and plug the right things in more often than not.
B

Napster/BitTorrent for machine learning?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

1

27 Stimmen

3 Beiträge

23 Aufrufe

G

What would a use case look like? I assume that the latency will make it impractical to train something that's LLM-sized. But even for something small, wouldn't a data center be more efficient?
C

My character isn't answering me
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

0 Stimmen

1 Beiträge

11 Aufrufe

Niemand hat geantwortet
F

*deleted by creator*
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

11 Aufrufe

Niemand hat geantwortet