linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI agents wrong ~70% of time: Carnegie Mellon study

Technology

278 Beiträge 108 Kommentatoren 123 Aufrufe

T timeworntraveler@lemmy.dbzer0.com

that is such a ridiculous idea. Just because you see hate for it in the media doesn't mean it originated there. I'll have you know that i have embarrassed myself by screaming at robot phone receptionists for years now. stupid fuckers pretending to be people but not knowing shit. I was born ready to hate LLMs and I'm not gonna have you claim that CNN made me do it.
M This user is from outside of this forum
M This user is from outside of this forum
melvin_ferd@lemmy.world

schrieb zuletzt editiert von

#138

Search AI in Lemmy and check out every article on it. It definitely is media spreading all the hate. And like this article is often some money yellow journalism
D T 2 Antworten Letzte Antwort

0
K katana314@lemmy.world

I'm in a workplace that has tried not to be overbearing about AI, but has encouraged us to use them for coding.

I've tried to give mine some very simple tasks like writing a unit test just for the constructor of a class to verify current behavior, and it generates output that's both wrong and doesn't verify anything.

I'm aware it sometimes gets better with more intricate, specific instructions, and that I can offer it further corrections, but at that point it's not even saving time. I would do this with a human in the hopes that they would continue to retain the knowledge, but I don't even have hopes for AI to apply those lessons in new contexts. In a way, it's been a sigh of relief to realize just like Dotcom, just like 3D TVs, just like home smart assistants, it is a bubble.
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#139

The first half dozen times I tried AI for code, across the past year or so, it failed pretty much as you describe.

Finally, I hit on some things it can do. For me: keeping the instructions more general, not specifying certain libraries for instance, was the key to getting something that actually does something. Also, if it doesn't show you the whole program, get it to show you the whole thing, and make it fix its own mistakes so you can build on working code with later requests.
V S 2 Antworten Letzte Antwort

4
O outhouseperilous@lemmy.dbzer0.com

No, it matters. Youre pushing the lie they want pushed.
H This user is from outside of this forum
H This user is from outside of this forum
honytawk@feddit.nl

schrieb zuletzt editiert von honytawk@feddit.nl

#140

And you're pushing a hate train with no aspect of nuance to show for it.

Seems like you are even less than 30% useful. And that is mainly because you can be used as fertilizer.
1 Antwort Letzte Antwort

0
M morto@piefed.social

and doesn't need to be exactly right

What kind of tasks do you consider that don't need to be exactly right?
H This user is from outside of this forum
H This user is from outside of this forum
honytawk@feddit.nl

schrieb zuletzt editiert von honytawk@feddit.nl

#141

Description generators for TTRPGs, as you will read through them afterwards anyway and correct when necessary.

Generating lists of ideas. For creative writing, getting a bunch of ideas you can pick and choose from that fit the narrative you want.

A search engine like Perplexity.ai which after searching summarizes the web page and adds a link to the page next to it. If the summary seems promising, you go to the real page to verify the actual information.

Simple code like HTML pages and boilerplate code that you will still review afterwards anyway.
1 Antwort Letzte Antwort

1
Z zbyte64@awful.systems

When LLMs get it right it's because they're summarizing a stack overflow or GitHub snippet it was trained on. But you loose all the benefits of other humans commenting on the context, pitfalls and other alternatives.
H This user is from outside of this forum
H This user is from outside of this forum
honytawk@feddit.nl

schrieb zuletzt editiert von

#142

You mean things you had to do anyway even if you didn't use LLMs?
1 Antwort Letzte Antwort

0
D dylanmorgan@slrpnk.net

That’s literally how “AI agents” are being marketed. “Tell it to do a thing and it will do it for you.”
H This user is from outside of this forum
H This user is from outside of this forum
honytawk@feddit.nl

schrieb zuletzt editiert von

#143

So? That doesn't mean they are supposed to be used like that.

Show me any marketing that isn't full of lies.
1 Antwort Letzte Antwort

0
M mangocats@feddit.it

The first half dozen times I tried AI for code, across the past year or so, it failed pretty much as you describe.

Finally, I hit on some things it can do. For me: keeping the instructions more general, not specifying certain libraries for instance, was the key to getting something that actually does something. Also, if it doesn't show you the whole program, get it to show you the whole thing, and make it fix its own mistakes so you can build on working code with later requests.
V This user is from outside of this forum
V This user is from outside of this forum
vivendi@programming.dev

schrieb zuletzt editiert von

#144

Have you tried insulting the AI in the system prompt (as well as other tunes to the system prompt)?

I'm not joking, it really works

For example:

Instead of "You are an intelligent coding assistant..."

"You are an absolute fucking idiot who can barely code..."
R M 2 Antworten Letzte Antwort

7
A alteredego@lemmy.ml

Emotion > Facts. Most people have been trained to blindly accept things and cheer on what fits with their agenda. Like technbro's exaggerating LLMs, or people like you misrepresenting LLMs as mere statistical word generators without intelligence. That's like saying a computer is just wires and switches, or missing the forest for the trees. Both is equally false.

Yet if it fits with the emotional needs or with dogma, then other will agree. It's a convenient and comforting "A vs B" worldview we've been trained to accept. And so the satisfying notion and misinformation keeps spreading.

LLMs tell us more about human intelligence and the human slop we've been generating. It tells us that most people are not that much more than statistical word generators.
S This user is from outside of this forum
S This user is from outside of this forum
some_guy@lemmy.sdf.org

schrieb zuletzt editiert von

#145

people like you misrepresenting LLMs as mere statistical word generators without intelligence.

You've bought-in to the hype. I won't try to argue with you because you aren't cognizent of reality.
A 1 Antwort Letzte Antwort

5
E eli001@lemmy.world

This post did not contain any content.
B This user is from outside of this forum
B This user is from outside of this forum
blackmist@feddit.uk

schrieb zuletzt editiert von

#146

We have created the overconfident intern in digital form.
J 1 Antwort Letzte Antwort

38
Z zbyte64@awful.systems

When LLMs get it right it's because they're summarizing a stack overflow or GitHub snippet it was trained on. But you loose all the benefits of other humans commenting on the context, pitfalls and other alternatives.
P This user is from outside of this forum
P This user is from outside of this forum
potentialproblem@sh.itjust.works

schrieb zuletzt editiert von

#147

You’re not wrong, but often I’m just trying to do something I’ve done a thousand times before and I already know the pitfalls. Also, I’m sure I’ve copied code from stackoverflow before.
1 Antwort Letzte Antwort

0
E eli001@lemmy.world

This post did not contain any content.
I This user is from outside of this forum
I This user is from outside of this forum
ileftreddit@lemmy.world

schrieb zuletzt editiert von

#148

Hey I went there
1 Antwort Letzte Antwort

0
S some_guy@lemmy.sdf.org

people like you misrepresenting LLMs as mere statistical word generators without intelligence.

You've bought-in to the hype. I won't try to argue with you because you aren't cognizent of reality.
A This user is from outside of this forum
A This user is from outside of this forum
alteredego@lemmy.ml

schrieb zuletzt editiert von

#149

You're projecting. Every accusation is a confession.
1 Antwort Letzte Antwort

0
V vivendi@programming.dev

Have you tried insulting the AI in the system prompt (as well as other tunes to the system prompt)?

I'm not joking, it really works

For example:

Instead of "You are an intelligent coding assistant..."

"You are an absolute fucking idiot who can barely code..."
R This user is from outside of this forum
R This user is from outside of this forum
rozodru@lemmy.world

schrieb zuletzt editiert von

#150

“You are an absolute fucking idiot who can barely code…”

Honestly, that's what you have to do. It's the only way I can get through using Claude.ai. I treat it like it's an absolute moron, I insult it, I "yell" at it, I threaten it and guess what? the solutions have gotten better. not great but a hell of a lot better than what they used to be. It really works. it forces it to really think through the problem, research solutions, cite sources, etc. I have even told it i'll cancel my subscription to it if it gets it wrong.

no more "do this and this and then this but do this first and then do this" after calling it a "fucking moron" and what have you it will provide an answer and just say "done."
D 1 Antwort Letzte Antwort

9
R rozodru@lemmy.world

“You are an absolute fucking idiot who can barely code…”

Honestly, that's what you have to do. It's the only way I can get through using Claude.ai. I treat it like it's an absolute moron, I insult it, I "yell" at it, I threaten it and guess what? the solutions have gotten better. not great but a hell of a lot better than what they used to be. It really works. it forces it to really think through the problem, research solutions, cite sources, etc. I have even told it i'll cancel my subscription to it if it gets it wrong.

no more "do this and this and then this but do this first and then do this" after calling it a "fucking moron" and what have you it will provide an answer and just say "done."
D This user is from outside of this forum
D This user is from outside of this forum
dragontypewyvern@midwest.social

schrieb zuletzt editiert von

#151

This guy is the moral lesson at the start of the apocalypse movie
M 1 Antwort Letzte Antwort

13
E eli001@lemmy.world

This post did not contain any content.
S This user is from outside of this forum
S This user is from outside of this forum
surph_ninja@lemmy.world

schrieb zuletzt editiert von surph_ninja@lemmy.world

#152

This is the same kind of short-sighted dismissal I see a lot in the religion vs science argument. When they hinge their pro-religion stance on the things science can’t explain, they’re defending an ever diminishing territory as science grows to explain more things. It’s a stupid strategy with an expiration date on your position.

All of the anti-AI positions, that hinge on the low quality or reliability of the output, are defending an increasingly diminished stance as the AI’s are further refined. And I simply don’t believe that the majority of the people making this argument actually care about the quality of the output. Even when it gets to the point of producing better output than humans across the board, these folks are still going to oppose it regardless. Why not just openly oppose it in general, instead of pinning your position to an argument that grows increasingly irrelevant by the day?

DeepSeek exposed the same issue with the anti-AI people dedicated to the environmental argument. We were shown proof that there’s significant progress in the development of efficient models, and it still didn’t change any of their minds. Because most of them don’t actually care about the environmental impacts. It’s just an anti-AI talking point that resonated with them.

The more baseless these anti-AI stances get, the more it seems to me that it’s a lot of people afraid of change and afraid of the fundamental economic shifts this will require, but they’re embarrassed or unable to articulate that stance. And it doesn’t help that the luddites haven’t been able to predict a single development. Just constantly flailing to craft a new argument to criticize the current models and tech. People are learning not to take these folks seriously.
C R 2 Antworten Letzte Antwort

4
V vivendi@programming.dev

Have you tried insulting the AI in the system prompt (as well as other tunes to the system prompt)?

I'm not joking, it really works

For example:

Instead of "You are an intelligent coding assistant..."

"You are an absolute fucking idiot who can barely code..."
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#153

I frequently find myself prompting it: "now show me the whole program with all the errors corrected." Sometimes I have to ask that two or three times, different ways, before it coughs up the next iteration ready to copy-paste-test. Most times when it gives errors I'll just write "address: " and copy-paste the error message in - frequently the text of the AI response will apologize, less frequently it will actually fix the error.
1 Antwort Letzte Antwort

4
D dragontypewyvern@midwest.social

This guy is the moral lesson at the start of the apocalypse movie
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#154

He's developing a toxic relationship with his AI agent. I don't think it's the best way to get what you want (demonstrating how to be abusive to the AI), but maybe it's the only method he is capable of getting results with.
1 Antwort Letzte Antwort

4
S surph_ninja@lemmy.world

This is the same kind of short-sighted dismissal I see a lot in the religion vs science argument. When they hinge their pro-religion stance on the things science can’t explain, they’re defending an ever diminishing territory as science grows to explain more things. It’s a stupid strategy with an expiration date on your position.

All of the anti-AI positions, that hinge on the low quality or reliability of the output, are defending an increasingly diminished stance as the AI’s are further refined. And I simply don’t believe that the majority of the people making this argument actually care about the quality of the output. Even when it gets to the point of producing better output than humans across the board, these folks are still going to oppose it regardless. Why not just openly oppose it in general, instead of pinning your position to an argument that grows increasingly irrelevant by the day?

DeepSeek exposed the same issue with the anti-AI people dedicated to the environmental argument. We were shown proof that there’s significant progress in the development of efficient models, and it still didn’t change any of their minds. Because most of them don’t actually care about the environmental impacts. It’s just an anti-AI talking point that resonated with them.

The more baseless these anti-AI stances get, the more it seems to me that it’s a lot of people afraid of change and afraid of the fundamental economic shifts this will require, but they’re embarrassed or unable to articulate that stance. And it doesn’t help that the luddites haven’t been able to predict a single development. Just constantly flailing to craft a new argument to criticize the current models and tech. People are learning not to take these folks seriously.
C This user is from outside of this forum
C This user is from outside of this forum
chaonaut@lemmy.4d2.org

schrieb zuletzt editiert von

#155

Maybe the marketers should be a bit more picky about what they slap "AI" on and maybe decision makers should be a little less eager to follow whatever Better Auto complete spits out, but maybe that's just me and we really should be pretending that all these algorithms really have made humans obsolete and generating convincing language is better than correspondence with reality.
S 1 Antwort Letzte Antwort

5
C chaonaut@lemmy.4d2.org

Maybe the marketers should be a bit more picky about what they slap "AI" on and maybe decision makers should be a little less eager to follow whatever Better Auto complete spits out, but maybe that's just me and we really should be pretending that all these algorithms really have made humans obsolete and generating convincing language is better than correspondence with reality.
S This user is from outside of this forum
S This user is from outside of this forum
surph_ninja@lemmy.world

schrieb zuletzt editiert von

#156

I’m not sure the anti-AI marketing stance is any more solid of a position. Though it’s probably easier to defend, since it’s so vague and not based on anything measurable.
C 1 Antwort Letzte Antwort

3
S surph_ninja@lemmy.world

I’m not sure the anti-AI marketing stance is any more solid of a position. Though it’s probably easier to defend, since it’s so vague and not based on anything measurable.
C This user is from outside of this forum
C This user is from outside of this forum
chaonaut@lemmy.4d2.org

schrieb zuletzt editiert von

#157

Calling AI measurable is somewhat unfounded. Between not having a coherent, agreed-upon definition of what does and does not constitute an AI (we are, after all, discussing LLMs as though they were AGI), and the difficulty that exists in discussing the qualifications of human intelligence, saying that a given metric covers how well a thing is an AI isn't really founded on anything but preference. We could, for example, say that mathematical ability is indicative of intelligence, but claiming FLOPS is a proxy for intelligence falls rather flat. We can measure things about the various algorithms, but that's an awful long ways off from talking about AI itself (unless we've bought into the marketing hype).
S 1 Antwort Letzte Antwort

3

Anmelden zum Antworten

P

My Honest Experience: Perodua vs Proton – Which One Truly Offers Better Value?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

2

0 Stimmen

1 Beiträge

3 Aufrufe

Niemand hat geantwortet
U

A Forensic Examination of GIS Arta
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

6 Stimmen

1 Beiträge

5 Aufrufe

Niemand hat geantwortet
D

British engineer jailed for 15 months for 'vile' X social media post
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
10

1

43 Stimmen

10 Beiträge

39 Aufrufe

D

Deserved it. Shouldn't have beem a racist xenophobe. Hate speech and incitement of violence is not legally protected in the UK. All those far-right rioters deserves prison.
P

No JS, No CSS, No HTML: online "clubs" celebrate plainer websites
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
205

2

771 Stimmen

205 Beiträge

614 Aufrufe

R

Gemini is just a web replacement protocol. With basic things we remember from olden days Web, but with everything non-essential removed, for a client to be doable in a couple of days. I have my own Gemini viewer, LOL. This for me seems a completely different application from torrents. I was dreaming for a thing similar to torrent trackers for aggregating storage and computation and indexing and search, with search and aggregation and other services' responses being structured and standardized, and cryptographic identities, and some kind of market services to sell and buy storage and computation in unified and pooled, but transparent way (scripted by buyer\seller), similar to MMORPG markets, with the representation (what is a siloed service in modern web) being on the client native application, and those services allowing to build any kind of client-server huge system on them, that being global. But that's more of a global Facebook\Usenet\whatever, a killer of platforms. Their infrastructure is internal, while their representation is public on the Internet. I want to make infrastructure public on the Internet, and representation client-side, sharing it for many kinds of applications. Adding another layer to the OSI model, so to say, between transport and application layer. For this application: I think you could have some kind of Kademlia-based p2p with groups voluntarily joined (involving very huge groups) where nodes store replicas of partitions of group common data based on their pseudo-random identifiers and/or some kind of ring built from those identifiers, to balance storage and resilience. If a group has a creator, then you can have replication factor propagated signed by them, and membership too signed by them. But if having a creator (even with cryptographically delegated decisions) and propagating changes by them is not ok, then maybe just using whole data hash, or it's bittorrent-like info tree hash, as namespace with peers freely joining it can do. Then it may be better to partition not by parts of the whole piece, but by info tree? I guess making it exactly bittorrent-like is not a good idea, rather some kind of block tree, like for a filesystem, and a separate piece of information to lookup which file is in which blocks. If we are doing directory structure. Then, with freely joining it, there's no need in any owners or replication factors, I guess just pseudorandom distribution of hashes will do, and each node storing first partitions closest to its hash. Now thinking about it, such a system would be not that different from bittorrent and can even be interoperable with it. There's the issue of updates, yes, hence I've started with groups having hierarchy of creators, who can make or accept those updates. Having that and the ability to gradually store one group's data to another group, it should be possible to do forks of a certain state. But that line of thought makes reusing bittorrent only possible for part of the system. The whole database is guaranteed to be more than a normal HDD (1 TB? I dunno). Absolutely guaranteed, no doubt at all. 1 TB (for example) would be someone's collection of favorite stuff, and not too rich one.
H

Could Windows and installed apps upload all my personal files?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1 Stimmen

2 Beiträge

16 Aufrufe

R

Yes, every application has access to everything. The only exception are those weird apps that use the universal framework or whatever that thing is called, those need to ask for permissions. But most of the apps on your PC have full access to everything. And Windows does collect and upload a lot of personal information and they could easily upload everything on your system. The same of course applies for the apps as well, they have access to everything except privileged folders (those usually don't contain your personal data, but system files).
L

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
204

1

897 Stimmen

204 Beiträge

265 Aufrufe

S

I know what an LLM is doing. You don't know what your brain is doing.
P

Booming Military Spending on AI is a Windfall for Tech—and a Blow to Democracy
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

38 Stimmen

7 Beiträge

35 Aufrufe

D

Not easy but not hard actually really simple if you had the right energy. Just ignore this so I don't scare you.
D

X.com blocks access to Ekrem Imamoglu, leader of Turkey political opposition
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
77

1

588 Stimmen

77 Beiträge

241 Aufrufe

F

When a Lemmy instance owner gets a legal request from a foreign countries government to take down content, after they’re done shitting themselves they’ll take the content down or they’ll have to implement a country wide block on that country, along with not allowing any citizens of that country to use their instance no matter where they are located. Block me, I don’t care. You’re just proving that you can’t handle the truth and being challenged with it.