linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI agents wrong ~70% of time: Carnegie Mellon study

278 Beiträge 108 Kommentatoren 153 Aufrufe

M morto@piefed.social

and doesn't need to be exactly right

What kind of tasks do you consider that don't need to be exactly right?
H This user is from outside of this forum
H This user is from outside of this forum
honytawk@feddit.nl

schrieb zuletzt editiert von honytawk@feddit.nl

#141

Description generators for TTRPGs, as you will read through them afterwards anyway and correct when necessary.

Generating lists of ideas. For creative writing, getting a bunch of ideas you can pick and choose from that fit the narrative you want.

A search engine like Perplexity.ai which after searching summarizes the web page and adds a link to the page next to it. If the summary seems promising, you go to the real page to verify the actual information.

Simple code like HTML pages and boilerplate code that you will still review afterwards anyway.
1 Antwort Letzte Antwort

1
Z zbyte64@awful.systems

When LLMs get it right it's because they're summarizing a stack overflow or GitHub snippet it was trained on. But you loose all the benefits of other humans commenting on the context, pitfalls and other alternatives.
H This user is from outside of this forum
H This user is from outside of this forum
honytawk@feddit.nl

schrieb zuletzt editiert von

#142

You mean things you had to do anyway even if you didn't use LLMs?
1 Antwort Letzte Antwort

0
D dylanmorgan@slrpnk.net

That’s literally how “AI agents” are being marketed. “Tell it to do a thing and it will do it for you.”
H This user is from outside of this forum
H This user is from outside of this forum
honytawk@feddit.nl

schrieb zuletzt editiert von

#143

So? That doesn't mean they are supposed to be used like that.

Show me any marketing that isn't full of lies.
1 Antwort Letzte Antwort

0
M mangocats@feddit.it

The first half dozen times I tried AI for code, across the past year or so, it failed pretty much as you describe.

Finally, I hit on some things it can do. For me: keeping the instructions more general, not specifying certain libraries for instance, was the key to getting something that actually does something. Also, if it doesn't show you the whole program, get it to show you the whole thing, and make it fix its own mistakes so you can build on working code with later requests.
V This user is from outside of this forum
V This user is from outside of this forum
vivendi@programming.dev

schrieb zuletzt editiert von

#144

Have you tried insulting the AI in the system prompt (as well as other tunes to the system prompt)?

I'm not joking, it really works

For example:

Instead of "You are an intelligent coding assistant..."

"You are an absolute fucking idiot who can barely code..."
R M 2 Antworten Letzte Antwort

7
A alteredego@lemmy.ml

Emotion > Facts. Most people have been trained to blindly accept things and cheer on what fits with their agenda. Like technbro's exaggerating LLMs, or people like you misrepresenting LLMs as mere statistical word generators without intelligence. That's like saying a computer is just wires and switches, or missing the forest for the trees. Both is equally false.

Yet if it fits with the emotional needs or with dogma, then other will agree. It's a convenient and comforting "A vs B" worldview we've been trained to accept. And so the satisfying notion and misinformation keeps spreading.

LLMs tell us more about human intelligence and the human slop we've been generating. It tells us that most people are not that much more than statistical word generators.
S This user is from outside of this forum
S This user is from outside of this forum
some_guy@lemmy.sdf.org

schrieb zuletzt editiert von

#145

people like you misrepresenting LLMs as mere statistical word generators without intelligence.

You've bought-in to the hype. I won't try to argue with you because you aren't cognizent of reality.
A 1 Antwort Letzte Antwort

5
E eli001@lemmy.world

This post did not contain any content.
B This user is from outside of this forum
B This user is from outside of this forum
blackmist@feddit.uk

schrieb zuletzt editiert von

#146

We have created the overconfident intern in digital form.
J 1 Antwort Letzte Antwort

38
Z zbyte64@awful.systems

When LLMs get it right it's because they're summarizing a stack overflow or GitHub snippet it was trained on. But you loose all the benefits of other humans commenting on the context, pitfalls and other alternatives.
P This user is from outside of this forum
P This user is from outside of this forum
potentialproblem@sh.itjust.works

schrieb zuletzt editiert von

#147

You’re not wrong, but often I’m just trying to do something I’ve done a thousand times before and I already know the pitfalls. Also, I’m sure I’ve copied code from stackoverflow before.
1 Antwort Letzte Antwort

0
E eli001@lemmy.world

This post did not contain any content.
I This user is from outside of this forum
I This user is from outside of this forum
ileftreddit@lemmy.world

schrieb zuletzt editiert von

#148

Hey I went there
1 Antwort Letzte Antwort

0
S some_guy@lemmy.sdf.org

people like you misrepresenting LLMs as mere statistical word generators without intelligence.

You've bought-in to the hype. I won't try to argue with you because you aren't cognizent of reality.
A This user is from outside of this forum
A This user is from outside of this forum
alteredego@lemmy.ml

schrieb zuletzt editiert von

#149

You're projecting. Every accusation is a confession.
1 Antwort Letzte Antwort

0
V vivendi@programming.dev

Have you tried insulting the AI in the system prompt (as well as other tunes to the system prompt)?

I'm not joking, it really works

For example:

Instead of "You are an intelligent coding assistant..."

"You are an absolute fucking idiot who can barely code..."
R This user is from outside of this forum
R This user is from outside of this forum
rozodru@lemmy.world

schrieb zuletzt editiert von

#150

“You are an absolute fucking idiot who can barely code…”

Honestly, that's what you have to do. It's the only way I can get through using Claude.ai. I treat it like it's an absolute moron, I insult it, I "yell" at it, I threaten it and guess what? the solutions have gotten better. not great but a hell of a lot better than what they used to be. It really works. it forces it to really think through the problem, research solutions, cite sources, etc. I have even told it i'll cancel my subscription to it if it gets it wrong.

no more "do this and this and then this but do this first and then do this" after calling it a "fucking moron" and what have you it will provide an answer and just say "done."
D 1 Antwort Letzte Antwort

9
R rozodru@lemmy.world

“You are an absolute fucking idiot who can barely code…”

Honestly, that's what you have to do. It's the only way I can get through using Claude.ai. I treat it like it's an absolute moron, I insult it, I "yell" at it, I threaten it and guess what? the solutions have gotten better. not great but a hell of a lot better than what they used to be. It really works. it forces it to really think through the problem, research solutions, cite sources, etc. I have even told it i'll cancel my subscription to it if it gets it wrong.

no more "do this and this and then this but do this first and then do this" after calling it a "fucking moron" and what have you it will provide an answer and just say "done."
D This user is from outside of this forum
D This user is from outside of this forum
dragontypewyvern@midwest.social

schrieb zuletzt editiert von

#151

This guy is the moral lesson at the start of the apocalypse movie
M 1 Antwort Letzte Antwort

13
E eli001@lemmy.world

This post did not contain any content.
S This user is from outside of this forum
S This user is from outside of this forum
surph_ninja@lemmy.world

schrieb zuletzt editiert von surph_ninja@lemmy.world

#152

This is the same kind of short-sighted dismissal I see a lot in the religion vs science argument. When they hinge their pro-religion stance on the things science can’t explain, they’re defending an ever diminishing territory as science grows to explain more things. It’s a stupid strategy with an expiration date on your position.

All of the anti-AI positions, that hinge on the low quality or reliability of the output, are defending an increasingly diminished stance as the AI’s are further refined. And I simply don’t believe that the majority of the people making this argument actually care about the quality of the output. Even when it gets to the point of producing better output than humans across the board, these folks are still going to oppose it regardless. Why not just openly oppose it in general, instead of pinning your position to an argument that grows increasingly irrelevant by the day?

DeepSeek exposed the same issue with the anti-AI people dedicated to the environmental argument. We were shown proof that there’s significant progress in the development of efficient models, and it still didn’t change any of their minds. Because most of them don’t actually care about the environmental impacts. It’s just an anti-AI talking point that resonated with them.

The more baseless these anti-AI stances get, the more it seems to me that it’s a lot of people afraid of change and afraid of the fundamental economic shifts this will require, but they’re embarrassed or unable to articulate that stance. And it doesn’t help that the luddites haven’t been able to predict a single development. Just constantly flailing to craft a new argument to criticize the current models and tech. People are learning not to take these folks seriously.
C R 2 Antworten Letzte Antwort

4
V vivendi@programming.dev

Have you tried insulting the AI in the system prompt (as well as other tunes to the system prompt)?

I'm not joking, it really works

For example:

Instead of "You are an intelligent coding assistant..."

"You are an absolute fucking idiot who can barely code..."
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#153

I frequently find myself prompting it: "now show me the whole program with all the errors corrected." Sometimes I have to ask that two or three times, different ways, before it coughs up the next iteration ready to copy-paste-test. Most times when it gives errors I'll just write "address: " and copy-paste the error message in - frequently the text of the AI response will apologize, less frequently it will actually fix the error.
1 Antwort Letzte Antwort

4
D dragontypewyvern@midwest.social

This guy is the moral lesson at the start of the apocalypse movie
M This user is from outside of this forum
M This user is from outside of this forum
mangocats@feddit.it

schrieb zuletzt editiert von

#154

He's developing a toxic relationship with his AI agent. I don't think it's the best way to get what you want (demonstrating how to be abusive to the AI), but maybe it's the only method he is capable of getting results with.
1 Antwort Letzte Antwort

4
S surph_ninja@lemmy.world

This is the same kind of short-sighted dismissal I see a lot in the religion vs science argument. When they hinge their pro-religion stance on the things science can’t explain, they’re defending an ever diminishing territory as science grows to explain more things. It’s a stupid strategy with an expiration date on your position.

All of the anti-AI positions, that hinge on the low quality or reliability of the output, are defending an increasingly diminished stance as the AI’s are further refined. And I simply don’t believe that the majority of the people making this argument actually care about the quality of the output. Even when it gets to the point of producing better output than humans across the board, these folks are still going to oppose it regardless. Why not just openly oppose it in general, instead of pinning your position to an argument that grows increasingly irrelevant by the day?

DeepSeek exposed the same issue with the anti-AI people dedicated to the environmental argument. We were shown proof that there’s significant progress in the development of efficient models, and it still didn’t change any of their minds. Because most of them don’t actually care about the environmental impacts. It’s just an anti-AI talking point that resonated with them.

The more baseless these anti-AI stances get, the more it seems to me that it’s a lot of people afraid of change and afraid of the fundamental economic shifts this will require, but they’re embarrassed or unable to articulate that stance. And it doesn’t help that the luddites haven’t been able to predict a single development. Just constantly flailing to craft a new argument to criticize the current models and tech. People are learning not to take these folks seriously.
C This user is from outside of this forum
C This user is from outside of this forum
chaonaut@lemmy.4d2.org

schrieb zuletzt editiert von

#155

Maybe the marketers should be a bit more picky about what they slap "AI" on and maybe decision makers should be a little less eager to follow whatever Better Auto complete spits out, but maybe that's just me and we really should be pretending that all these algorithms really have made humans obsolete and generating convincing language is better than correspondence with reality.
S 1 Antwort Letzte Antwort

5
C chaonaut@lemmy.4d2.org

Maybe the marketers should be a bit more picky about what they slap "AI" on and maybe decision makers should be a little less eager to follow whatever Better Auto complete spits out, but maybe that's just me and we really should be pretending that all these algorithms really have made humans obsolete and generating convincing language is better than correspondence with reality.
S This user is from outside of this forum
S This user is from outside of this forum
surph_ninja@lemmy.world

schrieb zuletzt editiert von

#156

I’m not sure the anti-AI marketing stance is any more solid of a position. Though it’s probably easier to defend, since it’s so vague and not based on anything measurable.
C 1 Antwort Letzte Antwort

3
S surph_ninja@lemmy.world

I’m not sure the anti-AI marketing stance is any more solid of a position. Though it’s probably easier to defend, since it’s so vague and not based on anything measurable.
C This user is from outside of this forum
C This user is from outside of this forum
chaonaut@lemmy.4d2.org

schrieb zuletzt editiert von

#157

Calling AI measurable is somewhat unfounded. Between not having a coherent, agreed-upon definition of what does and does not constitute an AI (we are, after all, discussing LLMs as though they were AGI), and the difficulty that exists in discussing the qualifications of human intelligence, saying that a given metric covers how well a thing is an AI isn't really founded on anything but preference. We could, for example, say that mathematical ability is indicative of intelligence, but claiming FLOPS is a proxy for intelligence falls rather flat. We can measure things about the various algorithms, but that's an awful long ways off from talking about AI itself (unless we've bought into the marketing hype).
S 1 Antwort Letzte Antwort

3
C chaonaut@lemmy.4d2.org

Calling AI measurable is somewhat unfounded. Between not having a coherent, agreed-upon definition of what does and does not constitute an AI (we are, after all, discussing LLMs as though they were AGI), and the difficulty that exists in discussing the qualifications of human intelligence, saying that a given metric covers how well a thing is an AI isn't really founded on anything but preference. We could, for example, say that mathematical ability is indicative of intelligence, but claiming FLOPS is a proxy for intelligence falls rather flat. We can measure things about the various algorithms, but that's an awful long ways off from talking about AI itself (unless we've bought into the marketing hype).
S This user is from outside of this forum
S This user is from outside of this forum
surph_ninja@lemmy.world

schrieb zuletzt editiert von surph_ninja@lemmy.world

#158

So you’re saying the article’s measurements about AI agents being wrong 70% of the time is made up? Or is AI performance only measurable when the results help anti-AI narratives?
J C 2 Antworten Letzte Antwort

0
E eli001@lemmy.world

This post did not contain any content.
F This user is from outside of this forum
F This user is from outside of this forum
fogetaboutit@programming.dev

schrieb zuletzt editiert von

#159

please bro just one hundred more GPU and one more billion dollars of research, we make it good please bro
S J 2 Antworten Letzte Antwort

78
Z zbyte64@awful.systems

It’s usually vastly easier to verify an answer than posit one, if you have the patience to do so.

I usually write 3x the code to test the code itself. Verification is often harder than implementation.
J This user is from outside of this forum
J This user is from outside of this forum
jsomae@lemmy.ml

schrieb zuletzt editiert von jsomae@lemmy.ml

#160

It really depends on the context. Sometimes there are domains which require solving problems in NP, but where it turns out that most of these problems are actually not hard to solve by hand with a bit of tinkering. SAT solvers might completely fail, but humans can do it. Often it turns out that this means there's a better algorithm that can exploit commanalities in the data. But a brute force approach might just be to give it to an LLM and then verify its answer. Verifying NP problems is easy.

(This is speculation.)
1 Antwort Letzte Antwort

1

Anmelden zum Antworten

V

Breakfast Cereals Market expected to reach USD 104.07 billion by 2032
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

0 Stimmen

1 Beiträge

0 Aufrufe

Niemand hat geantwortet
P

Digital Freedoms Enshrined: Building Unbreakable Constitutional Safeguards for Human Rights in the Age of Surveillance and Algorithms
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

29 Stimmen

2 Beiträge

12 Aufrufe

C

If you had asked me during the Obama administration I would have said this a chance of becoming law. Today I give it 0.002%.
S

Most Common PIN Codes
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
50

1

181 Stimmen

50 Beiträge

201 Aufrufe

E

Came here for this comment. Did not disappoint!
D

Judge dismisses authors' copyright lawsuit against Meta over AI training
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
24

1

111 Stimmen

24 Beiträge

83 Aufrufe

O

Ingesting all the artwork you ever created by obtaining it illegally and feeding it into my plagarism remix machine is theft of your work, because I did not pay for it. Separately, keeping a copy of this work so I can do this repeatedly is also stealing your work. The judge ruled the first was okay but the second was not because the first is "transformative", which sadly means to me that the judge despite best efforts does not understand how a weighted matrix of tokens works and that while they may have some prevention steps in place now, early models showed the tech for what it was as it regurgitated text with only minor differences in word choice here and there. Current models have layers on top to try and prevent this user input, but escaping those safeguards is common, and it's also only masking the fact that the entire model is built off of the theft of other's work.
N

Fully remote control your Nissan Leaf (or other modern cars)
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
27

1

145 Stimmen

27 Beiträge

116 Aufrufe

B

Never buy a tesla, Elon and any employee can just watch you, hell if they really wanted they could drive you into on coming traffic for the fun of it. Majority of those accidents were not.
P

Uber, Lyft oppose some bills that aim to prevent assaults during rides
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
12

94 Stimmen

12 Beiträge

48 Aufrufe

F

California is not Colorado nor is it federal No shit, did you even read my comment? Regulations already exist in every state that ride share companies operate in, including any state where taxis operate. People are already not supposed to sexually assault their passengers. Will adding another regulation saying they shouldn’t do that, even when one already exists, suddenly stop it from happening? No. Have you even looked at the regulations in Colorado for ride share drivers and companies? I’m guessing not. Here are the ones that were made in 2014: https://law.justia.com/codes/colorado/2021/title-40/article-10-1/part-6/section-40-10-1-605/#%3A~%3Atext=§+40-10.1-605.+Operational+Requirements+A+driver+shall+not%2Ca+ride%2C+otherwise+known+as+a+“street+hail”. Here’s just one little but relevant section: Before a person is permitted to act as a driver through use of a transportation network company's digital network, the person shall: Obtain a criminal history record check pursuant to the procedures set forth in section 40-10.1-110 as supplemented by the commission's rules promulgated under section 40-10.1-110 or through a privately administered national criminal history record check, including the national sex offender database; and If a privately administered national criminal history record check is used, provide a copy of the criminal history record check to the transportation network company. A driver shall obtain a criminal history record check in accordance with subparagraph (I) of paragraph (a) of this subsection (3) every five years while serving as a driver. A person who has been convicted of or pled guilty or nolo contendere to driving under the influence of drugs or alcohol in the previous seven years before applying to become a driver shall not serve as a driver. If the criminal history record check reveals that the person has ever been convicted of or pled guilty or nolo contendere to any of the following felony offenses, the person shall not serve as a driver: (c) (I) A person who has been convicted of or pled guilty or nolo contendere to driving under the influence of drugs or alcohol in the previous seven years before applying to become a driver shall not serve as a driver. If the criminal history record check reveals that the person has ever been convicted of or pled guilty or nolo contendere to any of the following felony offenses, the person shall not serve as a driver: An offense involving fraud, as described in article 5 of title 18, C.R.S.; An offense involving unlawful sexual behavior, as defined in section 16-22-102 (9), C.R.S.; An offense against property, as described in article 4 of title 18, C.R.S.; or A crime of violence, as described in section 18-1.3-406, C.R.S. A person who has been convicted of a comparable offense to the offenses listed in subparagraph (I) of this paragraph (c) in another state or in the United States shall not serve as a driver. A transportation network company or a third party shall retain true and accurate results of the criminal history record check for each driver that provides services for the transportation network company for at least five years after the criminal history record check was conducted. A person who has, within the immediately preceding five years, been convicted of or pled guilty or nolo contendere to a felony shall not serve as a driver. Before permitting an individual to act as a driver on its digital network, a transportation network company shall obtain and review a driving history research report for the individual. An individual with the following moving violations shall not serve as a driver: More than three moving violations in the three-year period preceding the individual's application to serve as a driver; or A major moving violation in the three-year period preceding the individual's application to serve as a driver, whether committed in this state, another state, or the United States, including vehicular eluding, as described in section 18-9-116.5, C.R.S., reckless driving, as described in section 42-4-1401, C.R.S., and driving under restraint, as described in section 42-2-138, C.R.S. A transportation network company or a third party shall retain true and accurate results of the driving history research report for each driver that provides services for the transportation network company for at least three years. So all sorts of criminal history, driving record, etc checks have been required since 2014. Colorado were actually the first state in the USA to implement rules like this for ride share companies lol.
T

Salesforce and Slack announce price hikes following expansion of AI integrations
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
16

1

144 Stimmen

16 Beiträge

61 Aufrufe

B

I know there decent alternatives to SalesForce, but I’m not sure what you’d replace Slack with. Teams is far worse in every conceivable way and I’m not sure if there’s anything else out there that isn’t already speeding down the enshittification highway.
P

YouTube's new ad strategy is bound to upset users: YouTube Peak Points utilise Gemini to identify moments where users will be most engaged, so advertisers can place ads at the point.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
15

1

4 Stimmen

15 Beiträge

8 Aufrufe

F

For future readers: Freetube currently works. Using it right now. Invidious works too, granted some instance do not. One needs to look for ones that are currently active.

1
2
10
11
12
13
14