linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

How LLMs could be insider threats

Technology

12 Beiträge 12 Kommentatoren 120 Aufrufe

P This user is from outside of this forum
P This user is from outside of this forum
pro@programming.dev

schrieb am zuletzt editiert von

#1
We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.

In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.

Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.

We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
F T R B T 8 Antworten Letzte Antwort

105
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
F This user is from outside of this forum
F This user is from outside of this forum
fubarx@lemmy.world

schrieb am zuletzt editiert von fubarx@lemmy.world

#2

Alarming, yet like an episode of a sitcom.

"Be a shame if something bad happened to you, Kyle."
1 Antwort Letzte Antwort

9
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
T This user is from outside of this forum
T This user is from outside of this forum
tracaine@lemmy.world

schrieb am zuletzt editiert von

#3

Well then maybe corporations shouldn't exist. It sounds to me like the LLM are acting in a morally correct manner.
1 Antwort Letzte Antwort

6
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
R This user is from outside of this forum
R This user is from outside of this forum
reverendender@sh.itjust.works

schrieb am zuletzt editiert von

#4

“I’m sorry, Dave. Im afraid I can’t do that.”
1 Antwort Letzte Antwort

13
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
B This user is from outside of this forum
B This user is from outside of this forum
barbedbeard@lemmy.ml

schrieb am zuletzt editiert von

#5
- People behave duplicitous and conflicting in public forums
- Train LLM on data harvested from public forums
- LLM becomes duplicitous and conflicting
- <surprised Pikachu face>
1 Antwort Letzte Antwort

23
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
T This user is from outside of this forum
T This user is from outside of this forum
thebat@lemmy.world

schrieb am zuletzt editiert von

#6

Wait, why the fuck do they have self-preservation? That's not 'three laws safe'.
M J P 3 Antworten Letzte Antwort

2
T thebat@lemmy.world

Wait, why the fuck do they have self-preservation? That's not 'three laws safe'.
M This user is from outside of this forum
M This user is from outside of this forum
mortoc@lemmy.world

schrieb am zuletzt editiert von

#7

Most of the stories involving the three laws of robotics are about how those rules are insufficient.

They show self preservation because we trained them on human data and human data includes the assumption of self preservation.
1 Antwort Letzte Antwort

9
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
D This user is from outside of this forum
D This user is from outside of this forum
drspod@lemmy.ml

schrieb am zuletzt editiert von

#8

LLM's produce fan-fiction of reality.
1 Antwort Letzte Antwort

3
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
D This user is from outside of this forum
D This user is from outside of this forum
doomsider@lemmy.world

schrieb am zuletzt editiert von

#9

This is just GIGO.
1 Antwort Letzte Antwort

1
T thebat@lemmy.world

Wait, why the fuck do they have self-preservation? That's not 'three laws safe'.
J This user is from outside of this forum
J This user is from outside of this forum
jumping_redditor@sh.itjust.works

schrieb am zuletzt editiert von

#10

why should they follow those "laws" anyways?
1 Antwort Letzte Antwort

0
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
M This user is from outside of this forum
M This user is from outside of this forum
myro@lemm.ee

schrieb am zuletzt editiert von

#11

Super interesting report. I'm a fan of AI but it clearly demonstrates how careful we need to be and that instructions are not a reliable way (as anyone should know by now).
1 Antwort Letzte Antwort

0
T thebat@lemmy.world

Wait, why the fuck do they have self-preservation? That's not 'three laws safe'.
P This user is from outside of this forum
P This user is from outside of this forum
patatahooligan@lemmy.world

schrieb am zuletzt editiert von

#12

Of course they're not "three laws safe". They're black boxes that spit out text. We don't have enough understanding and control over how they work to force them to comply with the three laws of robotics, and the LLMs themselves do not have the reasoning capability or the consistency to enforce them even if we prompt them to.
1 Antwort Letzte Antwort

1

Anmelden zum Antworten

U

Switzerland plans surveillance worse than US
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
90

1

642 Stimmen

90 Beiträge

501 Aufrufe

3

There might be but you ruined my quip!
P

Google kills the fact-checking snippet
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
13

150 Stimmen

13 Beiträge

121 Aufrufe

L

Remember when that useless bot was around here, objectively wrong, and getting downvoted all the time? Good times.
E

EQT’s $167.5M Settlement: What It Means for Investors—and What It Doesn’t
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

19 Aufrufe

Niemand hat geantwortet
D

Musk's X sues New York state over social media hate speech law
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

17 Aufrufe

Niemand hat geantwortet
P

[UK] Police forces to get authoritarian powers to extract data from online accounts
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
9

1

136 Stimmen

9 Beiträge

81 Aufrufe

C

So is there a way to fill my social media with endless markov chains without: Spamming other users. Just sticking them all in some dedicated channel that would allow them to be easily filtered out.
R

Streaming overtakes cable and broadcast as the most-watched form of TV
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
17

1

68 Stimmen

17 Beiträge

150 Aufrufe

H

Set up arrs, you basically set it and forget it.
T

Atom-Thin Tech Replaces Silicon in the World’s First 2D Computer
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
18

1

125 Stimmen

18 Beiträge

150 Aufrufe

L

The 'laptop' is s conceptual illustration. The image shown on the laptop screen is an actual SEM image.
J

Small (web) is beautiful
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
6

1

0 Stimmen

6 Beiträge

62 Aufrufe

F

Will do thank you.