linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

How LLMs could be insider threats

Technology

12 Beiträge 12 Kommentatoren 120 Aufrufe

P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
F This user is from outside of this forum
F This user is from outside of this forum
fubarx@lemmy.world

schrieb am zuletzt editiert von fubarx@lemmy.world

#2

Alarming, yet like an episode of a sitcom.

"Be a shame if something bad happened to you, Kyle."
1 Antwort Letzte Antwort

9
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
T This user is from outside of this forum
T This user is from outside of this forum
tracaine@lemmy.world

schrieb am zuletzt editiert von

#3

Well then maybe corporations shouldn't exist. It sounds to me like the LLM are acting in a morally correct manner.
1 Antwort Letzte Antwort

6
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
R This user is from outside of this forum
R This user is from outside of this forum
reverendender@sh.itjust.works

schrieb am zuletzt editiert von

#4

“I’m sorry, Dave. Im afraid I can’t do that.”
1 Antwort Letzte Antwort

13
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
B This user is from outside of this forum
B This user is from outside of this forum
barbedbeard@lemmy.ml

schrieb am zuletzt editiert von

#5
- People behave duplicitous and conflicting in public forums
- Train LLM on data harvested from public forums
- LLM becomes duplicitous and conflicting
- <surprised Pikachu face>
1 Antwort Letzte Antwort

23
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
T This user is from outside of this forum
T This user is from outside of this forum
thebat@lemmy.world

schrieb am zuletzt editiert von

#6

Wait, why the fuck do they have self-preservation? That's not 'three laws safe'.
M J P 3 Antworten Letzte Antwort

2
T thebat@lemmy.world

Wait, why the fuck do they have self-preservation? That's not 'three laws safe'.
M This user is from outside of this forum
M This user is from outside of this forum
mortoc@lemmy.world

schrieb am zuletzt editiert von

#7

Most of the stories involving the three laws of robotics are about how those rules are insufficient.

They show self preservation because we trained them on human data and human data includes the assumption of self preservation.
1 Antwort Letzte Antwort

9
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
D This user is from outside of this forum
D This user is from outside of this forum
drspod@lemmy.ml

schrieb am zuletzt editiert von

#8

LLM's produce fan-fiction of reality.
1 Antwort Letzte Antwort

3
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
D This user is from outside of this forum
D This user is from outside of this forum
doomsider@lemmy.world

schrieb am zuletzt editiert von

#9

This is just GIGO.
1 Antwort Letzte Antwort

1
T thebat@lemmy.world

Wait, why the fuck do they have self-preservation? That's not 'three laws safe'.
J This user is from outside of this forum
J This user is from outside of this forum
jumping_redditor@sh.itjust.works

schrieb am zuletzt editiert von

#10

why should they follow those "laws" anyways?
1 Antwort Letzte Antwort

0
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
M This user is from outside of this forum
M This user is from outside of this forum
myro@lemm.ee

schrieb am zuletzt editiert von

#11

Super interesting report. I'm a fan of AI but it clearly demonstrates how careful we need to be and that instructions are not a reliable way (as anyone should know by now).
1 Antwort Letzte Antwort

0
T thebat@lemmy.world

Wait, why the fuck do they have self-preservation? That's not 'three laws safe'.
P This user is from outside of this forum
P This user is from outside of this forum
patatahooligan@lemmy.world

schrieb am zuletzt editiert von

#12

Of course they're not "three laws safe". They're black boxes that spit out text. We don't have enough understanding and control over how they work to force them to comply with the three laws of robotics, and the LLMs themselves do not have the reasoning capability or the consistency to enforce them even if we prompt them to.
1 Antwort Letzte Antwort

1

Anmelden zum Antworten

D

Thinking Is Becoming a Luxury Good
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
30

65 Stimmen

30 Beiträge

228 Aufrufe

S

In political science, the term polyarchy (poly "many", arkhe "rule") was used by Robert A. Dahl to describe a form of government in which power is invested in multiple people. It takes the form of neither a dictatorship nor a democracy. This form of government was first implemented in the United States and France and gradually adopted by other countries. Polyarchy is different from democracy, according to Dahl, because the fundamental democratic principle is "the continuing responsiveness of the government to the preferences of its citizens, considered as political equals" with unimpaired opportunities. A polyarchy is a form of government that has certain procedures that are necessary conditions for following the democratic principle. So yeah, you are right. A representative "democracy" is not a democracy. It's a monarchy with more than one ruler. A gummy bear is as much a bear as representative democracy is a democracy. I didn't know that, because i was taught in school that a representative "democracy" is a form of democracy. And the name makes it sound like one. But it isn't. It's not even supposed to be in theory. I am sure 99% of people living in a representative "democracy" don't know this. I hereby encourage everyone to abandon the word representative "democracy" in favor of polyarchy or maybe oligarchy. This makes it much clearer what we are talking about. Also i doubt the authors of this article know this, because they imply that representative "democracy" is desirable, but it is obviously undesirable.
M

'The Next Level': Ex-KADOKAWA Chairman Says Generative AI and Short Anime Will Drive Japanese Content Forward - Anime Corner
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
21

1

71 Stimmen

21 Beiträge

303 Aufrufe

S

I almost agree with you, but I completely support your opinion that AI is crap.
T

Supreme Court to decide whether ISPs must disconnect users accused of piracy
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
179

1

678 Stimmen

179 Beiträge

4k Aufrufe

D

Thats what the firewall rules do too, don't allow internet connection if there's no vpn connection. Firewall is a system-wide solution that always works, while qbt config relies heavily on the application implementing interface binding properly. Which it doesn't fully btw.
J

Palantir Exposed: The New Deep State [27:27 | JUN 10 2025 | Glenn Greenwald]
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
60

100 Stimmen

60 Beiträge

518 Aufrufe

J

We all get emotional on certain topics; it is understandable. All is well, peace.
C

Unionize or die - Drew DeVault
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

75 Stimmen

3 Beiträge

40 Aufrufe

W

and hopefully also elsewhere. as Drew said in the first part, tech workers will be affected by billionaire's decisions even outside of work, on multiple fronts. we must eat the rich, or they will eat us all alive.
G

In North Korea, your phone secretly takes screenshots every 5 minutes for government surveillance
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
278

1

580 Stimmen

278 Beiträge

2k Aufrufe

V

The main difference being the consequences that might result from the surveillance.
V

Microsoft wants Windows Update to handle all apps
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
45

1

61 Stimmen

45 Beiträge

382 Aufrufe

N

the package managers for linux that i know of are great because you can easily control everything they do
D

Airlines Are Selling Your Data to ICE
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
23

1

553 Stimmen

23 Beiträge

273 Aufrufe

F

It’s not a loophole though.