linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Judge backs AI firm over use of copyrighted books

Technology

54 Beiträge 31 Kommentatoren 0 Aufrufe

O omegamouse@pawb.social

What, how is this a win? Three authors lost a lawsuit to an AI firm using their works.
G This user is from outside of this forum
G This user is from outside of this forum
grimy@lemmy.world

schrieb zuletzt editiert von

#28

The lawsuit would not have benefitted their fellow authors but their publishing houses and the big ai companies.
1 Antwort Letzte Antwort

2
B bob_omb_battlefield@sh.itjust.works

Yeah, I guess the debate is which is the lesser evil. I didn't make the original comment but I think this is what they were getting at.
G This user is from outside of this forum
G This user is from outside of this forum
grimy@lemmy.world

schrieb zuletzt editiert von grimy@lemmy.world

#29

Yes precisely.

I don't see a situation where the actual content creators get paid.

We either get open source ai, or we get closed ai where the big ai companies and copyright companies make bank.

I think people are having huge knee jerk reactions and end up supporting companies like Disney, Universal Music and Google.
1 Antwort Letzte Antwort

3
H hendrik@palaver.p3x.de

Keep in mind this isn't about open-weight vs other AI models at all. This is about how training data can be collected and used.
G This user is from outside of this forum
G This user is from outside of this forum
grimy@lemmy.world

schrieb zuletzt editiert von

#30

Because of the vast amount of data needed, there will be no competitive viable open source solution if half the data is kept in a walled garden.

This is about open weights vs closed weights.
J H 2 Antworten Letzte Antwort

3
N nomad_scry@lemmy.sdf.org

If they can just steal a creator's work, how do they suppose creators will be able to afford continuing to be creators?

Right. They think we have enough original works that the machines can just make any new creations.
G This user is from outside of this forum
G This user is from outside of this forum
grimy@lemmy.world

schrieb zuletzt editiert von

#31

The companies like record studio who already own all the copyrights aren't going to pay creators for something they already owned.

All the data has already been signed away. People are really optimistic about an industry that has consistently fucked everyone they interact with for money.
1 Antwort Letzte Antwort

2
D davriellelouna@lemmy.world

This post did not contain any content.
M This user is from outside of this forum
M This user is from outside of this forum
myopinion@lemmy.today

schrieb zuletzt editiert von

#32

I hate AI with a fire that keeps we warm at night. That is all.
1 Antwort Letzte Antwort

6
F facedeer@fedia.io

Is it this?

First, Authors argue that using works to train Claude’s underlying LLMs was like using works to train any person to read and write, so Authors should be able to exclude Anthropic from this use (Opp. 16).

That's the judge addressing an argument that the Authors made. If anyone made a "false equivalence" here it's the plaintiffs, the judge is simply saying "okay, let's assume their claim is true." As is the usual case for a preliminary judgment like this.
A This user is from outside of this forum
A This user is from outside of this forum
ag10n@lemmy.world

schrieb zuletzt editiert von

#33

Page 6 the judge writes the LLM “memorized” the content and could “recite” it.

Neither is true in training or use of LLMs
A 1 Antwort Letzte Antwort

2
D devfuuu@lemmy.world

That "freely" there really does a lot of hard work.
S This user is from outside of this forum
S This user is from outside of this forum
sculptuspoe@lemmy.world

schrieb zuletzt editiert von sculptuspoe@lemmy.world

#34

It means what it means, "freely" pulls its own weight. I didn't say "readily" accessible. Torrents could be viewed as "readily" accessible but it couldn't be viewed as "freely" accessible because at the very least you bear the guilt of theft. Library books are "freely" accessible, and if somehow the training involved checking out books and returning them digitally, it should be fine. If it is free to read into neurons it is free to read into neural systems. If payment for reading is expected then it isn't free.
W 1 Antwort Letzte Antwort

3
Q quadraturesurfer@lemmy.world

To anyone who is reading this comment without reading through the article. This ruling doesn't mean that it's okay to pirate for building a model. Anthropic will still need to go through trial for that:

But he rejected Anthropic's request to dismiss the case, ruling the firm would have to stand trial over its use of pirated copies to build its library of material.
A This user is from outside of this forum
A This user is from outside of this forum
artisian@lemmy.world

schrieb zuletzt editiert von artisian@lemmy.world

#35

I also read through the judgement, and I think it's better for anthropic than you describe. He distinguishes three issues:

A) Use any written material they get their hands on to train the model (and the resulting model doesn't just reproduce the works).

B) Buy a single copy of a print book, scan it, and retain the digital copy for a company library (for all sorts of future purposes).

C) Pirate a book and retain that copy for a company library (for all sorts of future purposes).

A and B were fair use by summary judgement. Meaning this judge thinks it's clear cut in anthropics favor. C will go to trial.
X 1 Antwort Letzte Antwort

7
A ag10n@lemmy.world

Page 6 the judge writes the LLM “memorized” the content and could “recite” it.

Neither is true in training or use of LLMs
A This user is from outside of this forum
A This user is from outside of this forum
artisian@lemmy.world

schrieb zuletzt editiert von

#36

Depends on the content and the method. There are tons of ways to encrypt data, and under relevant law they may still count as copies. There are certainly weaker NN models where we can extract a lot of the training data, even if it's not easy, from the model parameters (even if we can't find a prompt that gets the model to regurgitate).
1 Antwort Letzte Antwort

2
A aboubenadhem@lemmy.world

IMO the focus should have always been on the potential for AI to produce copyright-violating output, not on the method of training.
A This user is from outside of this forum
A This user is from outside of this forum
artisian@lemmy.world

schrieb zuletzt editiert von artisian@lemmy.world

#37

Plantifs made that argument and the judge shoots it down pretty hard. That competition isn't what copyright protects from. He makes an analogy with teachers teaching children to write fiction: they are using existing fantasy to create MANY more competitors on the fiction market. Could an author use copyright to challenge that use?

Would love to hear your thoughts on the ruling itself (it's linked by reuters).
1 Antwort Letzte Antwort

2
S sculptuspoe@lemmy.world

It means what it means, "freely" pulls its own weight. I didn't say "readily" accessible. Torrents could be viewed as "readily" accessible but it couldn't be viewed as "freely" accessible because at the very least you bear the guilt of theft. Library books are "freely" accessible, and if somehow the training involved checking out books and returning them digitally, it should be fine. If it is free to read into neurons it is free to read into neural systems. If payment for reading is expected then it isn't free.
W This user is from outside of this forum
W This user is from outside of this forum
womble@lemmy.world

schrieb zuletzt editiert von

#38

Civil cases of copyright infringment are not theft, no matter what the MPIA have trained you to believe.
J 1 Antwort Letzte Antwort

1
A artisian@lemmy.world

I also read through the judgement, and I think it's better for anthropic than you describe. He distinguishes three issues:

A) Use any written material they get their hands on to train the model (and the resulting model doesn't just reproduce the works).

B) Buy a single copy of a print book, scan it, and retain the digital copy for a company library (for all sorts of future purposes).

C) Pirate a book and retain that copy for a company library (for all sorts of future purposes).

A and B were fair use by summary judgement. Meaning this judge thinks it's clear cut in anthropics favor. C will go to trial.
X This user is from outside of this forum
X This user is from outside of this forum
xthexder@l.sw0.com

schrieb zuletzt editiert von

#39

C could still bankrupt the company depending on how trial goes. They pirated a lot of books.
A 1 Antwort Letzte Antwort

3
G gaylord_fartmaster@lemmy.world

Because books are used to train both commercial and open source language models?
S This user is from outside of this forum
S This user is from outside of this forum
sentient_loom@sh.itjust.works

schrieb zuletzt editiert von

#40

used to train both commercial

commercial training is, in this case, stealing people's work for commercial gain

and open source language models

so, uh, let us train open-source models on open-source text. There's so much of it that there's no need to steal.

?

I'm not sure why you added a question mark at the end of your statement.
G 1 Antwort Letzte Antwort

0
W womble@lemmy.world

Civil cases of copyright infringment are not theft, no matter what the MPIA have trained you to believe.
J This user is from outside of this forum
J This user is from outside of this forum
jcbazpx@lemmy.world

schrieb zuletzt editiert von

#41

But they are copyright infringement, which costs more than theft.
1 Antwort Letzte Antwort

0
G grimy@lemmy.world

Because of the vast amount of data needed, there will be no competitive viable open source solution if half the data is kept in a walled garden.

This is about open weights vs closed weights.
J This user is from outside of this forum
J This user is from outside of this forum
jcbazpx@lemmy.world

schrieb zuletzt editiert von

#42

They haven't dewalled the garden yet. The copyright infringement part of the case will continue.
1 Antwort Letzte Antwort

1
O omegamouse@pawb.social

What, how is this a win? Three authors lost a lawsuit to an AI firm using their works.
S This user is from outside of this forum
S This user is from outside of this forum
shittybeatlesfcpres@lemmy.world

schrieb zuletzt editiert von shittybeatlesfcpres@lemmy.world

#43

It would harm the A.I. industry if Anthropic loses the next part of the trial on whether they pirated books — from what I’ve read, Anthropic and Meta are suspected of getting a lot off torrent sites and the like.

It’s possible they all did some piracy in their mad dash to find training material but Amazon and Google have bookstores and Google even has a book text search engine, Google Scholar, and probably everything else already in its data centers. So, not sure why they’d have to resort to piracy.
1 Antwort Letzte Antwort

0
X xthexder@l.sw0.com

C could still bankrupt the company depending on how trial goes. They pirated a lot of books.
A This user is from outside of this forum
A This user is from outside of this forum
artisian@lemmy.world

schrieb zuletzt editiert von

#44

As a civil matter, the publishing houses are more likely to get the full money if anthropic stays in business (and does well). So it might be bad, but I'm really skeptical about bankruptcy (and I'm not hearing anyone seriously floating it?)
1 Antwort Letzte Antwort

1
D davriellelouna@lemmy.world

This post did not contain any content.
B This user is from outside of this forum
B This user is from outside of this forum
blametheantifa@lemmy.world

schrieb zuletzt editiert von blametheantifa@lemmy.world

#45

Anakin: “Judge backs AI firm over use of copyrighted books”
Padme: “But they’ll be held accountable when they reproduce parts of those works or compete with the work they were trained on, right?”
Anakin: “…”
Padme: “Right?”
1 Antwort Letzte Antwort

4
G grimy@lemmy.world

Because of the vast amount of data needed, there will be no competitive viable open source solution if half the data is kept in a walled garden.

This is about open weights vs closed weights.
H This user is from outside of this forum
H This user is from outside of this forum
hendrik@palaver.p3x.de

schrieb zuletzt editiert von hendrik@palaver.p3x.de

#46

I agree that we need open-source and emancipate ourselves. The main issue I see is: The entire approach doesn't work. I'd like to give the internet as an example. It's meant to be very open, connect everyone and enable them to share information freely. It is set up to be a level playing field... Now look what that leads to. Trillion dollar mega-corporations, privacy issues everywhere and big data silos. That's what the approach promotes. I agree with the goal. But in my opinion the approach will turn out to lead to less open source and more control by rich companies. And that's not what we want.

Plus nobody even opens the walled gardes. Last time I looked, Reddit wanted money for data. Other big platforms aren't open either. And there's kind of a small war going on with the scrapers and crawlers and anti-measures. So it's not as if it's open as of now.
G 1 Antwort Letzte Antwort

0
D davriellelouna@lemmy.world

This post did not contain any content.
F This user is from outside of this forum
F This user is from outside of this forum
fingolfinz@lemmy.world

schrieb zuletzt editiert von

#47

Pirate everything!
1 Antwort Letzte Antwort

3

Anmelden zum Antworten

T

Software is evolving backwards
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
63

1

335 Stimmen

63 Beiträge

2 Aufrufe

P

I, too, am a Linux user.
P

As disinformation and hate thrive online, YouTube quietly changed how it moderates content
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
39

103 Stimmen

39 Beiträge

12 Aufrufe

P

This “study” is biased by design. But also even if it weren’t , one study does not prove anything. You’d need a lot more evidence than that.
P

Google’s Privacy Sandbox is Dead. The Fight for Real Online Privacy Continues.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
4

1

104 Stimmen

4 Beiträge

12 Aufrufe

C

Now we need an open source browser runtime...
D

Power-Hungry Data Centers Are Warming Homes in Nordic Countries
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

1

12 Stimmen

3 Beiträge

6 Aufrufe

T

This is also a thing in Denmark. It's required by law to even build a data center.
A

The world could experience a year above 2°C of warming by 2029
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
17

1

201 Stimmen

17 Beiträge

17 Aufrufe

S

Thank you for the clarification.
P

Forget Big Brother. It’s the startups silently watching workers now.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
18

1

324 Stimmen

18 Beiträge

9 Aufrufe

D

Do you think a plumber dreams about being a plumber?
P

WordPress has formed an AI team
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

10 Stimmen

7 Beiträge

7 Aufrufe

0

Mmm fair point
J

Do you dislike your dependency on Android? To the rescue comes Mobile Linux "PostmarketOS" - Funded via Donations (link to 2025 Priorities -> Focus on Reliabilty, Audi, Camera, etc)
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
4

0 Stimmen

4 Beiträge

2 Aufrufe

K

Only way I'll want a different phone brand is if it comes with ZERO bloatware and has an excellent internal memory/storage cleanse that has nothing to do with Google's Files or a random app I'm not sure I can trust without paying or rooting. So far my A series phones do what I need mostly and in my opinion is superior to the Motorola's my fiancé prefers minus the phone-phone charge ability his has, everything else I'm just glad I have enough control to tweak things to my liking, however these days Samsungs seem to be infested with Google bloatware and apps that insist on opening themselves back up regardless of the widespread battery restrictions I've assigned (even was sent a "Stop Closing my Apps" notif that sent me to an article ) short of Disabling many unnecessary apps bc fully rooting my devices is something I rarely do anymore. I have a random Chinese brand tablet where I actually have more control over the apps than either of my A series phones whee Force Stopping STAYS that way when I tell them to! I hate being listened to for ads and the unwanted draining my battery life and data (I live off-grid and pay data rates because "Unlimited" is some throttled BS) so my ability to control what's going on in the background matters a lot to me, enough that I'm anti Meta-apps and avoid all non-essential Google apps. I can't afford topline phones and the largest data plan, so I work with what I can afford and I'm sad refurbished A lines seem to be getting more expensive while giving away my control to companies. Last A line I bought that was supposed to be my first 5G phone was network locked, so I got ripped off, but it still serves me well in off-grid life. Only app that actually regularly malfunctions when I Force Stop it's background presence is Roku, which I find to have very an almost insidious presence in our lives. Google Play, Chrome, and Spotify never acts incompetent in any way no matter how I have to open the setting every single time I turn Airplane Mode off. Don't need Gmail with Chrome and DuckDuckGo has been awesome at intercepting self-loading ads. I hope one day DDG gets better bc Google seems to be terrible lately and I even caught their AI contradicting itself when asking about if Homo Florensis is considered Human (yes) and then asked the oldest age of human remains, and was fed the outdated narrative of 300,000 years versus 700,000+ years bipedal pre-humans have been carbon dated outside of the Cradle of Humanity in South Africa. SO sorry to go off-topic, but I've got a big gripe with Samsung's partnership with Google, especially considering the launch of Quantum Computed AI that is still being fine-tuned with company-approved censorships.