linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Judge backs AI firm over use of copyrighted books

Technology

59 Beiträge 34 Kommentatoren 546 Aufrufe

M mudman@fedia.io

It is entirely possible that the entire construct of copyright just isn't fit to regulate this and the "right to train" or to avoid training needs to be formulated separately.

The maximalist, knee-jerk assumption that all AI training is copying is feeding into the interests of, ironically, a bunch of AI companies. That doesn't mean that actual authors and artists don't have an interest in regulating this space.

The big takeaway, in my book, is copyright is finally broken beyond all usability. Let's scrap it and start over with the media landscape we actually have, not the eighteenth century version of it.
H This user is from outside of this forum
H This user is from outside of this forum
hendrik@palaver.p3x.de

schrieb am zuletzt editiert von hendrik@palaver.p3x.de

#21

I'm fairly certain this is the correct answer here. Also there is a seperation between judicative and legislative. It's the former which is involved, but we really need to bother the latter. It's the only way, unless we want to use 18th century tools on the current situation.
1 Antwort Letzte Antwort

4
B bob_omb_battlefield@sh.itjust.works

If you aren't allowed to freely use data for training without a license, then the fear is that only large companies will own enough works or be able to afford licenses to train models.
H This user is from outside of this forum
H This user is from outside of this forum
hendrik@palaver.p3x.de

schrieb am zuletzt editiert von hendrik@palaver.p3x.de

#22

Yes. But then do something about it. Regulate the market. Or pass laws which address this. I don't really see why we should do something like this then, it still kind of contributes to the problem as free reign still advantages big companies.

(And we can write in law whatever we like. It doesn't need to be a stupid and simplistic solution. If you're concerned with big companies, just write they have to pay a lot and small companies don't. Or force everyone to open their models. That's all options which can be formulated as a new rule. And those would address the issue at hand.)
1 Antwort Letzte Antwort

2
S sonofantenora@lemmy.world

Cool than, try to do some torrenting out there and don't hide that. Tell us how it goes.

The rules don't change. This just means AI overlords can do it, not that you can do it too
O This user is from outside of this forum
O This user is from outside of this forum
ofcoursenot@fedia.io

schrieb am zuletzt editiert von

#23

I've been pirating since Napster, never have hidden shit. It's usually not a crime, except in America it seems, to download content, or even share it freely. What is a crime is to make a business distributing pirated content.
S 1 Antwort Letzte Antwort

3
O ofcoursenot@fedia.io

I've been pirating since Napster, never have hidden shit. It's usually not a crime, except in America it seems, to download content, or even share it freely. What is a crime is to make a business distributing pirated content.
S This user is from outside of this forum
S This user is from outside of this forum
sonofantenora@lemmy.world

schrieb am zuletzt editiert von

#24

I know but you see what they're doing with ai, a small server used for piracy and sharing is punished, in some cases, worse than a theft. AI business are making bank (or are they? There is still no clear path to profitability) on troves pirated content. This (for small guys like us) is not going to change the situation. For instance, if we used the same dataset to train some AI in a garage and with no business or investor behind things would be different. We're at a stage where AI is quite literally to important to fail for somebody out there. I'd argue that AI is, in fact going to be shielded for this reason regardless of previous legal outcomes.
H 1 Antwort Letzte Antwort

1
S sonofantenora@lemmy.world

I know but you see what they're doing with ai, a small server used for piracy and sharing is punished, in some cases, worse than a theft. AI business are making bank (or are they? There is still no clear path to profitability) on troves pirated content. This (for small guys like us) is not going to change the situation. For instance, if we used the same dataset to train some AI in a garage and with no business or investor behind things would be different. We're at a stage where AI is quite literally to important to fail for somebody out there. I'd argue that AI is, in fact going to be shielded for this reason regardless of previous legal outcomes.
H This user is from outside of this forum
H This user is from outside of this forum
hendrik@palaver.p3x.de

schrieb am zuletzt editiert von

#25

Agreed. And even if it were, it's always like this. Anthropic is a big company. They likely have millions available for good lawyers. While the small guy hasn't. So they're more able to just do stuff and do away with some legal restrictions. Or just pay a fine and that's pocket change for them. So big companies always have more options than the small guy.
1 Antwort Letzte Antwort

1
F facedeer@fedia.io

Did you read the actual order? The detailed conclusions begin on page 9. What specific bits did he get wrong?
V This user is from outside of this forum
V This user is from outside of this forum
viatoromnium@piefed.social

schrieb am zuletzt editiert von

#26

I'm on page 12 and I already saw a false equivalence between human learning and AI training.
F 1 Antwort Letzte Antwort

9
V viatoromnium@piefed.social

I'm on page 12 and I already saw a false equivalence between human learning and AI training.
F This user is from outside of this forum
F This user is from outside of this forum
facedeer@fedia.io

schrieb am zuletzt editiert von

#27

Is it this?

First, Authors argue that using works to train Claude’s underlying LLMs was like using works to train any person to read and write, so Authors should be able to exclude Anthropic from this use (Opp. 16).

That's the judge addressing an argument that the Authors made. If anyone made a "false equivalence" here it's the plaintiffs, the judge is simply saying "okay, let's assume their claim is true." As is the usual case for a preliminary judgment like this.
A 1 Antwort Letzte Antwort

11
O omegamouse@pawb.social

What, how is this a win? Three authors lost a lawsuit to an AI firm using their works.
G This user is from outside of this forum
G This user is from outside of this forum
grimy@lemmy.world

schrieb am zuletzt editiert von

#28

The lawsuit would not have benefitted their fellow authors but their publishing houses and the big ai companies.
1 Antwort Letzte Antwort

4
B bob_omb_battlefield@sh.itjust.works

Yeah, I guess the debate is which is the lesser evil. I didn't make the original comment but I think this is what they were getting at.
G This user is from outside of this forum
G This user is from outside of this forum
grimy@lemmy.world

schrieb am zuletzt editiert von grimy@lemmy.world

#29

Yes precisely.

I don't see a situation where the actual content creators get paid.

We either get open source ai, or we get closed ai where the big ai companies and copyright companies make bank.

I think people are having huge knee jerk reactions and end up supporting companies like Disney, Universal Music and Google.
1 Antwort Letzte Antwort

5
H hendrik@palaver.p3x.de

Keep in mind this isn't about open-weight vs other AI models at all. This is about how training data can be collected and used.
G This user is from outside of this forum
G This user is from outside of this forum
grimy@lemmy.world

schrieb am zuletzt editiert von

#30

Because of the vast amount of data needed, there will be no competitive viable open source solution if half the data is kept in a walled garden.

This is about open weights vs closed weights.
J H 2 Antworten Letzte Antwort

5
N nomad_scry@lemmy.sdf.org

If they can just steal a creator's work, how do they suppose creators will be able to afford continuing to be creators?

Right. They think we have enough original works that the machines can just make any new creations.
G This user is from outside of this forum
G This user is from outside of this forum
grimy@lemmy.world

schrieb am zuletzt editiert von

#31

The companies like record studio who already own all the copyrights aren't going to pay creators for something they already owned.

All the data has already been signed away. People are really optimistic about an industry that has consistently fucked everyone they interact with for money.
1 Antwort Letzte Antwort

2
D davriellelouna@lemmy.world

This post did not contain any content.
M This user is from outside of this forum
M This user is from outside of this forum
myopinion@lemmy.today

schrieb am zuletzt editiert von

#32

I hate AI with a fire that keeps we warm at night. That is all.
1 Antwort Letzte Antwort

6
F facedeer@fedia.io

Is it this?

First, Authors argue that using works to train Claude’s underlying LLMs was like using works to train any person to read and write, so Authors should be able to exclude Anthropic from this use (Opp. 16).

That's the judge addressing an argument that the Authors made. If anyone made a "false equivalence" here it's the plaintiffs, the judge is simply saying "okay, let's assume their claim is true." As is the usual case for a preliminary judgment like this.
A This user is from outside of this forum
A This user is from outside of this forum
ag10n@lemmy.world

schrieb am zuletzt editiert von

#33

Page 6 the judge writes the LLM “memorized” the content and could “recite” it.

Neither is true in training or use of LLMs
A 1 Antwort Letzte Antwort

4
D devfuuu@lemmy.world

That "freely" there really does a lot of hard work.
S This user is from outside of this forum
S This user is from outside of this forum
sculptuspoe@lemmy.world

schrieb am zuletzt editiert von sculptuspoe@lemmy.world

#34

It means what it means, "freely" pulls its own weight. I didn't say "readily" accessible. Torrents could be viewed as "readily" accessible but it couldn't be viewed as "freely" accessible because at the very least you bear the guilt of theft. Library books are "freely" accessible, and if somehow the training involved checking out books and returning them digitally, it should be fine. If it is free to read into neurons it is free to read into neural systems. If payment for reading is expected then it isn't free.
W 1 Antwort Letzte Antwort

4
Q quadraturesurfer@lemmy.world

To anyone who is reading this comment without reading through the article. This ruling doesn't mean that it's okay to pirate for building a model. Anthropic will still need to go through trial for that:

But he rejected Anthropic's request to dismiss the case, ruling the firm would have to stand trial over its use of pirated copies to build its library of material.
A This user is from outside of this forum
A This user is from outside of this forum
artisian@lemmy.world

schrieb am zuletzt editiert von artisian@lemmy.world

#35

I also read through the judgement, and I think it's better for anthropic than you describe. He distinguishes three issues:

A) Use any written material they get their hands on to train the model (and the resulting model doesn't just reproduce the works).

B) Buy a single copy of a print book, scan it, and retain the digital copy for a company library (for all sorts of future purposes).

C) Pirate a book and retain that copy for a company library (for all sorts of future purposes).

A and B were fair use by summary judgement. Meaning this judge thinks it's clear cut in anthropics favor. C will go to trial.
X 1 Antwort Letzte Antwort

14
A ag10n@lemmy.world

Page 6 the judge writes the LLM “memorized” the content and could “recite” it.

Neither is true in training or use of LLMs
A This user is from outside of this forum
A This user is from outside of this forum
artisian@lemmy.world

schrieb am zuletzt editiert von

#36

Depends on the content and the method. There are tons of ways to encrypt data, and under relevant law they may still count as copies. There are certainly weaker NN models where we can extract a lot of the training data, even if it's not easy, from the model parameters (even if we can't find a prompt that gets the model to regurgitate).
1 Antwort Letzte Antwort

2
A aboubenadhem@lemmy.world

IMO the focus should have always been on the potential for AI to produce copyright-violating output, not on the method of training.
A This user is from outside of this forum
A This user is from outside of this forum
artisian@lemmy.world

schrieb am zuletzt editiert von artisian@lemmy.world

#37

Plantifs made that argument and the judge shoots it down pretty hard. That competition isn't what copyright protects from. He makes an analogy with teachers teaching children to write fiction: they are using existing fantasy to create MANY more competitors on the fiction market. Could an author use copyright to challenge that use?

Would love to hear your thoughts on the ruling itself (it's linked by reuters).
C 1 Antwort Letzte Antwort

4
S sculptuspoe@lemmy.world

It means what it means, "freely" pulls its own weight. I didn't say "readily" accessible. Torrents could be viewed as "readily" accessible but it couldn't be viewed as "freely" accessible because at the very least you bear the guilt of theft. Library books are "freely" accessible, and if somehow the training involved checking out books and returning them digitally, it should be fine. If it is free to read into neurons it is free to read into neural systems. If payment for reading is expected then it isn't free.
W This user is from outside of this forum
W This user is from outside of this forum
womble@lemmy.world

schrieb am zuletzt editiert von

#38

Civil cases of copyright infringment are not theft, no matter what the MPIA have trained you to believe.
J 1 Antwort Letzte Antwort

2
A artisian@lemmy.world

I also read through the judgement, and I think it's better for anthropic than you describe. He distinguishes three issues:

A) Use any written material they get their hands on to train the model (and the resulting model doesn't just reproduce the works).

B) Buy a single copy of a print book, scan it, and retain the digital copy for a company library (for all sorts of future purposes).

C) Pirate a book and retain that copy for a company library (for all sorts of future purposes).

A and B were fair use by summary judgement. Meaning this judge thinks it's clear cut in anthropics favor. C will go to trial.
X This user is from outside of this forum
X This user is from outside of this forum
xthexder@l.sw0.com

schrieb am zuletzt editiert von

#39

C could still bankrupt the company depending on how trial goes. They pirated a lot of books.
A X 2 Antworten Letzte Antwort

6
G gaylord_fartmaster@lemmy.world

Because books are used to train both commercial and open source language models?
S This user is from outside of this forum
S This user is from outside of this forum
sentient_loom@sh.itjust.works

schrieb am zuletzt editiert von

#40

used to train both commercial

commercial training is, in this case, stealing people's work for commercial gain

and open source language models

so, uh, let us train open-source models on open-source text. There's so much of it that there's no need to steal.

?

I'm not sure why you added a question mark at the end of your statement.
G 1 Antwort Letzte Antwort

1

Anmelden zum Antworten

R

Chip giants Nvidia, AMD to pay US government 15 percent of Chinese revenue
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

37 Stimmen

3 Beiträge

0 Aufrufe

H

Not to diminish but everything he does is illegal. Tariffs are supposed to be done by congress.
R

Instagram Maps feature raises privacy concerns among some users
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
6

1

39 Stimmen

6 Beiträge

10 Aufrufe

A

Not at all.
T

Two major AI coding tools wiped out user data after making cascading mistakes
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
37

1

237 Stimmen

37 Beiträge

255 Aufrufe

K

AI has some use but it always needs human oversight and the final decision must also be made by a human professional. If you use AI to speed up tasks and you know whether the output of the AI is valid or not, and you have the final decision, then you can safely use it. But if you let AI decide on and execute important tasks basically autonomously, then you have a recipe for disaster. Fully autonomous and mistake-free AI is a naive pipe dream which I don't see on the horizon at all.
R

Goldman Sachs is piloting its first autonomous coder in major AI milestone for Wall Street
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
15

1

85 Stimmen

15 Beiträge

198 Aufrufe

I

Oh nice I hope they end up destroying themselves
C

$1.5 Billion AI Company That Reportedly Used No Actual AI Goes Belly Up
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
51

1

346 Stimmen

51 Beiträge

734 Aufrufe

4

Interestingly it loads today. I have AdAway on my phone and PiHole in my home network
P

Telegram and xAI agreed a one-year deal to integrate Grok into the chat app; Telegram will get $300M in cash and equity from xAI and 50% of subscription revenue.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
131

2

272 Stimmen

131 Beiträge

1k Aufrufe

E

This is good to know. I hadn't read the fine print, because I abandoned Telegram and never looked back. I hope its true and I agree, I also wouldn't think they'd do this and then renege into a possible lawsuit.
A

X blocks 8,000 accounts in India under government order
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1

58 Stimmen

2 Beiträge

32 Aufrufe

G

'member Aug 6 2024: https://www.ft.com/content/31919b4e-4a5a-4eba-ada7-88d3fec455f8 ;D UK faces resistance from X over taking down disinformation during riots Social media site owner Elon Musk has also been posting jibes at UK Prime Minister Keir Starmer Waiting to see those jibes at Modi... And who could forget in April 11, 2024: https://apnews.com/article/brazil-musk-x-twitter-moraes-bef06c0dbbb8ed87495b1afbb0edf211 What to know about Elon Musk’s ‘free speech’ feud with a Brazilian judge gotta see that feud with Indian judges, nobody asked him to block 8000 accounts, including western media outlets, whatever is he gonna do?
F

Microsoft rolls Windows Recall out to the public nearly a year after announcing it
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

1

0 Stimmen

7 Beiträge

73 Aufrufe

C

Domain or azure ad join is what I'm used to, but for personal machines and friends/family I do local accounts.