linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI agents wrong ~70% of time: Carnegie Mellon study

Technology

88 Beiträge 51 Kommentatoren 0 Aufrufe

M morto@piefed.social

and doesn't need to be exactly right

What kind of tasks do you consider that don't need to be exactly right?
K This user is from outside of this forum
K This user is from outside of this forum
korhaka@sopuli.xyz

schrieb zuletzt editiert von

#61

Make a basic HTML template. I'll be changing it up anyway.
1 Antwort Letzte Antwort

0
J jsomae@lemmy.ml

I'd just like to point out that, from the perspective of somebody watching AI develop for the past 10 years, completing 30% of automated tasks successfully is pretty good! Ten years ago they could not do this at all. Overlooking all the other issues with AI, I think we are all irritated with the AI hype people for saying things like they can be right 100% of the time -- Amazon's new CEO actually said they would be able to achieve 100% accuracy this year, lmao. But being able to do 30% of tasks successfully is already useful.
O This user is from outside of this forum
O This user is from outside of this forum
outhouseperilous@lemmy.dbzer0.com

schrieb zuletzt editiert von

#62

Please stop.
J 1 Antwort Letzte Antwort

3
E eli001@lemmy.world

This post did not contain any content.
L This user is from outside of this forum
L This user is from outside of this forum
lmagitem@lemmy.zip

schrieb zuletzt editiert von

#63

Color me surprised
1 Antwort Letzte Antwort

0
M morto@piefed.social

and doesn't need to be exactly right

What kind of tasks do you consider that don't need to be exactly right?
S This user is from outside of this forum
S This user is from outside of this forum
spankmonkey@lemmy.world

schrieb zuletzt editiert von

#64

Things that are inspiration or for approximations. Layout examples, possible correlations between data sets that need coincidence to be filtered out, estimating time lines, and basically anything that is close enough for a human to take the output and then do something with it.

For example, if you put in a list of ingredients it can spit out recipes that may or may not be what you want, but it can be an inspiration. Taking the output and cooking without any review and consideration would be risky.
1 Antwort Letzte Antwort

0
M melvin_ferd@lemmy.world

Ok what about tech journalists who produced articles with those misunderstandings. Surely they know better yet still produce articles like this. But also people who care enough about this topic to post these articles usually I assume know better yet still spread this crap
S This user is from outside of this forum
S This user is from outside of this forum
some_guy@lemmy.sdf.org

schrieb zuletzt editiert von

#65

Check out Ed Zitron's angry reporting on Tech journalists fawning over this garbage and reporting on it uncritically. He has a newsletter and a podcast.
1 Antwort Letzte Antwort

2
T tankovayadiviziya@lemmy.world

At least AI won't fire you.
C This user is from outside of this forum
C This user is from outside of this forum
corkyskog@sh.itjust.works

schrieb zuletzt editiert von

#66

It kinda does when you ask it something it doesn't like.
1 Antwort Letzte Antwort

2
M morto@piefed.social

and doesn't need to be exactly right

What kind of tasks do you consider that don't need to be exactly right?
S This user is from outside of this forum
S This user is from outside of this forum
sheeettin@lemmy.zip

schrieb zuletzt editiert von sheeettin@lemmy.zip

#67

Most. I've used ChatGPT to sketch an outline of a document, reformulate accomplishments into review bullets, rephrase a task I didnt understand, and similar stuff. None of it needed to be anywhere near perfect or complete.

Edit: and my favorite, "what's the word for..."
1 Antwort Letzte Antwort

1
O outhouseperilous@lemmy.dbzer0.com

Please stop.
J This user is from outside of this forum
J This user is from outside of this forum
jsomae@lemmy.ml

schrieb zuletzt editiert von

#68

I'm not claiming that the use of AI is ethical. If you want to fight back you have to take it seriously though.
O 1 Antwort Letzte Antwort

7
J jsomae@lemmy.ml

I'm not claiming that the use of AI is ethical. If you want to fight back you have to take it seriously though.
O This user is from outside of this forum
O This user is from outside of this forum
outhouseperilous@lemmy.dbzer0.com

schrieb zuletzt editiert von

#69

It cant do 30% of tasks vorrectly. It can do tasks correctly as much as 30% of the time, and since it's llm shit you know those numbers have been more massaged than any human in history has ever been.
J 1 Antwort Letzte Antwort

1
O outhouseperilous@lemmy.dbzer0.com

It cant do 30% of tasks vorrectly. It can do tasks correctly as much as 30% of the time, and since it's llm shit you know those numbers have been more massaged than any human in history has ever been.
J This user is from outside of this forum
J This user is from outside of this forum
jsomae@lemmy.ml

schrieb zuletzt editiert von

#70

I meant the latter, not "it can do 30% of tasks correctly 100% of the time."
O 1 Antwort Letzte Antwort

2
J jsomae@lemmy.ml

I meant the latter, not "it can do 30% of tasks correctly 100% of the time."
O This user is from outside of this forum
O This user is from outside of this forum
outhouseperilous@lemmy.dbzer0.com

schrieb zuletzt editiert von

#71

You get how that's fucking useless, generally?
J 1 Antwort Letzte Antwort

0
S synae@lemmy.sdf.org

... And nowadays they let the LLM help with the bullshittery
M This user is from outside of this forum
M This user is from outside of this forum
melvin_ferd@lemmy.world

schrieb zuletzt editiert von

#72

Are you guys sure. The media seems to be where a lot of LLM hate originates.
S 1 Antwort Letzte Antwort

0
J jsomae@lemmy.ml

I'd just like to point out that, from the perspective of somebody watching AI develop for the past 10 years, completing 30% of automated tasks successfully is pretty good! Ten years ago they could not do this at all. Overlooking all the other issues with AI, I think we are all irritated with the AI hype people for saying things like they can be right 100% of the time -- Amazon's new CEO actually said they would be able to achieve 100% accuracy this year, lmao. But being able to do 30% of tasks successfully is already useful.
S This user is from outside of this forum
S This user is from outside of this forum
shayeta@feddit.org

schrieb zuletzt editiert von

#73

It doesn't matter if you need a human to review. AI has no way distinguishing between success and failure. Either way a human will have to review 100% of those tasks.
J 1 Antwort Letzte Antwort

8
C criss_cross@lemmy.world

I’m sorry as an AI I cannot physically color you shocked. I can help you with AWS services and questions.
S This user is from outside of this forum
S This user is from outside of this forum
shayeta@feddit.org

schrieb zuletzt editiert von

#74

How do I set up event driven document ingestion from OneDrive located on an Azure tenant to Amazon DocumentDB? Ingestion must be near-realtime, durable, and have some form of DLQ.
C 1 Antwort Letzte Antwort

1
M melvin_ferd@lemmy.world

Are you guys sure. The media seems to be where a lot of LLM hate originates.
S This user is from outside of this forum
S This user is from outside of this forum
synae@lemmy.sdf.org

schrieb zuletzt editiert von

#75

Whatever gets ad views
1 Antwort Letzte Antwort

0
O outhouseperilous@lemmy.dbzer0.com

You get how that's fucking useless, generally?
J This user is from outside of this forum
J This user is from outside of this forum
jsomae@lemmy.ml

schrieb zuletzt editiert von

#76

yes, that's generally useless. It should not be shoved down people's throats. 30% accuracy still has its uses, especially if the result can be programmatically verified.
O 1 Antwort Letzte Antwort

2
S shayeta@feddit.org

It doesn't matter if you need a human to review. AI has no way distinguishing between success and failure. Either way a human will have to review 100% of those tasks.
J This user is from outside of this forum
J This user is from outside of this forum
jsomae@lemmy.ml

schrieb zuletzt editiert von

#77

Right, so this is really only useful in cases where either it's vastly easier to verify an answer than posit one, or if a conventional program can verify the result of the AI's output.
1 Antwort Letzte Antwort

2
S shayeta@feddit.org

How do I set up event driven document ingestion from OneDrive located on an Azure tenant to Amazon DocumentDB? Ingestion must be near-realtime, durable, and have some form of DLQ.
C This user is from outside of this forum
C This user is from outside of this forum
criss_cross@lemmy.world

schrieb zuletzt editiert von

#78

I see you mention Azure and will assume you’re doing a one time migration.

Start by moving everything from OneDrive to S3. As an AI I’m told that bitches love S3. From there you can subscribe to create events on buckets and add events to an SQS queue. Here you can enable a DLQ for failed events.

From there add a Lambda to listen for SQS events. You should enable provisioned concurrency for speed, the ability for AWS to bill you more, and so that you can have a dandy of a time figuring out why an old version of your lambda is still running even though you deployed the latest version and everything telling you that creating a new ID for the lambda each time to fix it fucking lies.

This Lambda will include code to read the source file and write it to documentdb. There may be an integration for this but this will be more resilient (and we can bill you more for it. )

Would you like to see sample CDK code? Tough shit because all I can do is assist with questions on AWS services.
1 Antwort Letzte Antwort

1
J jsomae@lemmy.ml

yes, that's generally useless. It should not be shoved down people's throats. 30% accuracy still has its uses, especially if the result can be programmatically verified.
O This user is from outside of this forum
O This user is from outside of this forum
outhouseperilous@lemmy.dbzer0.com

schrieb zuletzt editiert von outhouseperilous@lemmy.dbzer0.com

#79

Less broadly useful than 20 tons of mixed texture human shit, and more ecologically devastatimg.
J 1 Antwort Letzte Antwort

0
O outhouseperilous@lemmy.dbzer0.com

Less broadly useful than 20 tons of mixed texture human shit, and more ecologically devastatimg.
J This user is from outside of this forum
J This user is from outside of this forum
jsomae@lemmy.ml

schrieb zuletzt editiert von

#80

Are you just trolling or do you seriously not understand how something which can do a task correctly with 30% reliability can be made useful if the result can be automatically verified.
O 1 Antwort Letzte Antwort

4

Anmelden zum Antworten

E

OpenAI's $210K Residency Program Tackles AI Talent Shortage
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1

7 Stimmen

2 Beiträge

1 Aufrufe

R

Why don’t they use AI to replace human AI developers?
P

Marginalized Americans are highly skeptical of artificial intelligence
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
34

1

206 Stimmen

34 Beiträge

104 Aufrufe

R

I looked into that and the only question I really have is how geographically distributed the samples were. Other than that, It was an oversampled study, so <50% of the people were the control, of sorts. I don't fully understand how the sampling worked, but there is a substantial chart at the bottom of the study that shows the full distribution of responses. Even with under 1000 people, it seems legit.
I

Microsoft Shifts Gears On AI Chip Design Plans
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

19 Stimmen

2 Beiträge

12 Aufrufe

P

AI needs to be regulated with an energy cap. If you need more capacity, optimise your AI. Don't just throw more electricity at it.
2

The Decline of Usability: Revisited | datagubbe.se
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

0 Stimmen

2 Beiträge

11 Aufrufe

2

Just saw this article linked in a ThePrimeagen video. I didn't watch the video, but I did read the article, and all of this article is exactly what I'm always saying when I'm complaining about current UI trends and why I'm so picky about the software I use and also the tools I use to write software. I shouldn't have to be picky, but it seems like developers (professional and hobbyist alike) don't care anymore and users don't have standards.
M

YouTube might slow down your videos if you block ads
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
226

1

650 Stimmen

226 Beiträge

256 Aufrufe

D

[image: 24aa87b2-162d-4296-aaf7-31d42f30ed63.png]
M

Russia's State Duma passes bill to create state messaging app as it considers blocking WhatsApp
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
16

1

132 Stimmen

16 Beiträge

56 Aufrufe

V

Ah, yes. That's correct, sorry I misunderstood you. Yeah that's pretty lame that it doesn't work on desktop. I remember wanting to use that several times.
A

I am disappointed in the AI discourse
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
27

7 Stimmen

27 Beiträge

90 Aufrufe

A

I apologize that apparently Lemmy/Reddit people do not have enough self-awareness to accept good criticism, especially if it was just automatically generated and have downloaded that to oblivion. Though I don't really think you should respond to comments with a chatGPT link, not exactly helpful. Comes off a tad bit AI Bro...
G

Nextcloud cries foul over Google Play Store app rejection
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
31

1

256 Stimmen

31 Beiträge

101 Aufrufe

S

I have the regular F-droid and it does automatic updates now.