linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

How LLMs could be insider threats

Technology

12 Beiträge 12 Kommentatoren 120 Aufrufe

P This user is from outside of this forum
P This user is from outside of this forum
pro@programming.dev

schrieb am zuletzt editiert von

#1
We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.

In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.

Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.

We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
F T R B T 8 Antworten Letzte Antwort

105
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
F This user is from outside of this forum
F This user is from outside of this forum
fubarx@lemmy.world

schrieb am zuletzt editiert von fubarx@lemmy.world

#2

Alarming, yet like an episode of a sitcom.

"Be a shame if something bad happened to you, Kyle."
1 Antwort Letzte Antwort

9
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
T This user is from outside of this forum
T This user is from outside of this forum
tracaine@lemmy.world

schrieb am zuletzt editiert von

#3

Well then maybe corporations shouldn't exist. It sounds to me like the LLM are acting in a morally correct manner.
1 Antwort Letzte Antwort

6
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
R This user is from outside of this forum
R This user is from outside of this forum
reverendender@sh.itjust.works

schrieb am zuletzt editiert von

#4

“I’m sorry, Dave. Im afraid I can’t do that.”
1 Antwort Letzte Antwort

13
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
B This user is from outside of this forum
B This user is from outside of this forum
barbedbeard@lemmy.ml

schrieb am zuletzt editiert von

#5
- People behave duplicitous and conflicting in public forums
- Train LLM on data harvested from public forums
- LLM becomes duplicitous and conflicting
- <surprised Pikachu face>
1 Antwort Letzte Antwort

23
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
T This user is from outside of this forum
T This user is from outside of this forum
thebat@lemmy.world

schrieb am zuletzt editiert von

#6

Wait, why the fuck do they have self-preservation? That's not 'three laws safe'.
M J P 3 Antworten Letzte Antwort

2
T thebat@lemmy.world

Wait, why the fuck do they have self-preservation? That's not 'three laws safe'.
M This user is from outside of this forum
M This user is from outside of this forum
mortoc@lemmy.world

schrieb am zuletzt editiert von

#7

Most of the stories involving the three laws of robotics are about how those rules are insufficient.

They show self preservation because we trained them on human data and human data includes the assumption of self preservation.
1 Antwort Letzte Antwort

9
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
D This user is from outside of this forum
D This user is from outside of this forum
drspod@lemmy.ml

schrieb am zuletzt editiert von

#8

LLM's produce fan-fiction of reality.
1 Antwort Letzte Antwort

3
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
D This user is from outside of this forum
D This user is from outside of this forum
doomsider@lemmy.world

schrieb am zuletzt editiert von

#9

This is just GIGO.
1 Antwort Letzte Antwort

1
T thebat@lemmy.world

Wait, why the fuck do they have self-preservation? That's not 'three laws safe'.
J This user is from outside of this forum
J This user is from outside of this forum
jumping_redditor@sh.itjust.works

schrieb am zuletzt editiert von

#10

why should they follow those "laws" anyways?
1 Antwort Letzte Antwort

0
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
M This user is from outside of this forum
M This user is from outside of this forum
myro@lemm.ee

schrieb am zuletzt editiert von

#11

Super interesting report. I'm a fan of AI but it clearly demonstrates how careful we need to be and that instructions are not a reliable way (as anyone should know by now).
1 Antwort Letzte Antwort

0
T thebat@lemmy.world

Wait, why the fuck do they have self-preservation? That's not 'three laws safe'.
P This user is from outside of this forum
P This user is from outside of this forum
patatahooligan@lemmy.world

schrieb am zuletzt editiert von

#12

Of course they're not "three laws safe". They're black boxes that spit out text. We don't have enough understanding and control over how they work to force them to comply with the three laws of robotics, and the LLMs themselves do not have the reasoning capability or the consistency to enforce them even if we prompt them to.
1 Antwort Letzte Antwort

1

Anmelden zum Antworten

P

Someone who claims to have scraped Spotify public listening data from a number of public figures — politicians, celebrities, journalists — spun up their alleged playlists and made it into a site
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
13

32 Stimmen

13 Beiträge

46 Aufrufe

Z

Couch, Couch, Couch, woo!
D

Mark Zuckerberg writes a manifesto on bringing AI "superintelligence" to everyone to improve humanity, but doesn't even define what superintelligence means.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
82

244 Stimmen

82 Beiträge

357 Aufrufe

S

Can't Soros just harvest babies and feed the bots adrenochrome?
C

New youtube web video player interface...?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1

7 Stimmen

2 Beiträge

19 Aufrufe

E

I still see the older one, that's not that different tbh
D

Smoking avatars and online games: how big tobacco targets young people in the metaverse
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
32

1

200 Stimmen

32 Beiträge

391 Aufrufe

E

Jesus I can't think of anything I would want less than a Teams metaverse. Although I do have a macabre fascination as to how they could make the product even worse.
D

UK police are being told to hide their work with Palantir
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
5

1

276 Stimmen

5 Beiträge

56 Aufrufe

M

This is really fucking dark for multiple reasons
A

Hastags killed
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
6

1

16 Stimmen

6 Beiträge

64 Aufrufe

K

£ says: "The fuck they are, mate!"
D

VCs are starting to partner with private equity to buy up call centers, accounting firms and other "mature companies" to replace their operations with AI
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
134

1

275 Stimmen

134 Beiträge

3k Aufrufe

S

Wait until AI reduces it to just owners.
T

Advanced OpenAI models hallucinate more than older versions, internal report finds
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

1

0 Stimmen

7 Beiträge

68 Aufrufe

V

Just downloaded it, thanks for the info!