linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

How LLMs could be insider threats

Technology

12 Beiträge 12 Kommentatoren 127 Aufrufe

P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
T This user is from outside of this forum
T This user is from outside of this forum
tracaine@lemmy.world

schrieb am zuletzt editiert von

#3

Well then maybe corporations shouldn't exist. It sounds to me like the LLM are acting in a morally correct manner.
1 Antwort Letzte Antwort

6
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
R This user is from outside of this forum
R This user is from outside of this forum
reverendender@sh.itjust.works

schrieb am zuletzt editiert von

#4

“I’m sorry, Dave. Im afraid I can’t do that.”
1 Antwort Letzte Antwort

13
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
B This user is from outside of this forum
B This user is from outside of this forum
barbedbeard@lemmy.ml

schrieb am zuletzt editiert von

#5
- People behave duplicitous and conflicting in public forums
- Train LLM on data harvested from public forums
- LLM becomes duplicitous and conflicting
- <surprised Pikachu face>
1 Antwort Letzte Antwort

23
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
T This user is from outside of this forum
T This user is from outside of this forum
thebat@lemmy.world

schrieb am zuletzt editiert von

#6

Wait, why the fuck do they have self-preservation? That's not 'three laws safe'.
M J P 3 Antworten Letzte Antwort

2
T thebat@lemmy.world

Wait, why the fuck do they have self-preservation? That's not 'three laws safe'.
M This user is from outside of this forum
M This user is from outside of this forum
mortoc@lemmy.world

schrieb am zuletzt editiert von

#7

Most of the stories involving the three laws of robotics are about how those rules are insufficient.

They show self preservation because we trained them on human data and human data includes the assumption of self preservation.
1 Antwort Letzte Antwort

9
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
D This user is from outside of this forum
D This user is from outside of this forum
drspod@lemmy.ml

schrieb am zuletzt editiert von

#8

LLM's produce fan-fiction of reality.
1 Antwort Letzte Antwort

3
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
D This user is from outside of this forum
D This user is from outside of this forum
doomsider@lemmy.world

schrieb am zuletzt editiert von

#9

This is just GIGO.
1 Antwort Letzte Antwort

1
T thebat@lemmy.world

Wait, why the fuck do they have self-preservation? That's not 'three laws safe'.
J This user is from outside of this forum
J This user is from outside of this forum
jumping_redditor@sh.itjust.works

schrieb am zuletzt editiert von

#10

why should they follow those "laws" anyways?
1 Antwort Letzte Antwort

0
P pro@programming.dev
- We stress-tested 16 leading models from multiple developers in hypothetical corporate environments to identify potentially risky agentic behaviors before they cause real harm. In the scenarios, we allowed models to autonomously send emails and access sensitive information. They were assigned only harmless business goals by their deploying companies; we then tested whether they would act against these companies either when facing replacement with an updated version, or when their assigned goal conflicted with the company's changing direction.
- In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment.
- Models often disobeyed direct commands to avoid such behaviors. In another experiment, we told Claude to assess if it was in a test or a real deployment before acting. It misbehaved less when it stated it was in testing and misbehaved more when it stated the situation was real.
- We have not seen evidence of agentic misalignment in real deployments. However, our results (a) suggest caution about deploying current models in roles with minimal human oversight and access to sensitive information; (b) point to plausible future risks as models are put in more autonomous roles; and (c) underscore the importance of further research into, and testing of, the safety and alignment of agentic AI models, as well as transparency from frontier AI developers. We are releasing our methods publicly to enable further research.
M This user is from outside of this forum
M This user is from outside of this forum
myro@lemm.ee

schrieb am zuletzt editiert von

#11

Super interesting report. I'm a fan of AI but it clearly demonstrates how careful we need to be and that instructions are not a reliable way (as anyone should know by now).
1 Antwort Letzte Antwort

0
T thebat@lemmy.world

Wait, why the fuck do they have self-preservation? That's not 'three laws safe'.
P This user is from outside of this forum
P This user is from outside of this forum
patatahooligan@lemmy.world

schrieb am zuletzt editiert von

#12

Of course they're not "three laws safe". They're black boxes that spit out text. We don't have enough understanding and control over how they work to force them to comply with the three laws of robotics, and the LLMs themselves do not have the reasoning capability or the consistency to enforce them even if we prompt them to.
1 Antwort Letzte Antwort

1

Anmelden zum Antworten

M

Coordinated network amplifies child sex abuse on X, researchers warn
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

1

63 Stimmen

7 Beiträge

25 Aufrufe

M

Release the Trump/Epstein files
M

Cloudflare gets involved in the battle against piracy, blocking streaming websites in the UK — and VPNs won't help
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
8

1

33 Stimmen

8 Beiträge

57 Aufrufe

A

they don't just whimsically decide on a daily basis whether or not to comply with court orders. something changed legally that caused them to take action.
3

12ft.io down?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
22

42 Stimmen

22 Beiträge

394 Aufrufe

I

How do you do that? (ELI5, please)
P

UK to be first country to use AI healthcare system to prevent future scandals
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
6

1

9 Stimmen

6 Beiträge

60 Aufrufe

F

You said it yourself: extra places that need human attention ... those need ... humans, right? It's easy to say "let AI find the mistakes". But that tells us nothing at all. There's no substance. It's just a sales pitch for snake oil. In reality, there are various ways one can leverage technology to identify various errors, but that only happens through the focused actions of people who actually understand the details of what's happening. And think about it here. We already have computer systems that monitor patients' real-time data when they're hospitalized. We already have systems that check for allergies in prescribed medication. We already have systems for all kinds of safety mechanisms. We're already using safety tech in hospitals, so what can be inferred from a vague headline about AI doing something that's ... checks notes ... already being done? ... Yeah, the safe money is that it's just a scam.
Z

The U.S. Immigration and Customs
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

0 Stimmen

1 Beiträge

23 Aufrufe

Niemand hat geantwortet
P

How Do I Prepare My Phone for a Protest?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
139

1

505 Stimmen

139 Beiträge

4k Aufrufe

D

So first, even here we see foundation money and big tech, not government. Facebook, Google, etc mostly love net neutrality, tolerate encryption, anf see utility in anonymous internet access, mostly because these things don't interfere with their core advertising businesses, and generally have helped them. I didn't see Comcast and others in the ISP oligopoly on that list, probably because they would not benefit from net neutrality, encryption, and privacy for obvious reasons. The EFF advocates for particular civil libertarian policies, always has. That does attract certain donors, but not others. They have plenty of diverse and grassroots support too. One day they may have to choose between their corpo donors and their values, but I have yet to see them abandon principles.
P

Twenty-seven states and DC sue 23andMe to oppose the sale of DNA data from its customers without their direct consent
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
41

1

816 Stimmen

41 Beiträge

374 Aufrufe

C

And then price us out
T

Telegram partners with xAI to bring Grok to over a billion users
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
36

1

38 Stimmen

36 Beiträge

525 Aufrufe

R

So you pay taxes to Putin. Good to know who actually helps funding the regime. I suggest you go someplace else. I won't take this from a jerk from likely one of the countries buying fossil fuels from said regime, that have also supported it after a few falsified elections starting in 1996, which is also the year I was born. And of course "paying taxes to Putin" can't be even compared to what TG is doing, so just shut up and go do something you know how to do, like I dunno what.