linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Technology

254 Beiträge 123 Kommentatoren 6.5k Aufrufe

G gaja@lemm.ee

Google search results aren't deterministic but I wouldn't say it "learns" like a person. Algorithms with pattern detection isn't the same as human learning.
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemmy.zip

schrieb am zuletzt editiert von

#217

You may be correct but we don't really know how humans learn.

There's a ton of research on it and a lot of theories but no clear answers.
There's general agreement that the brain is a bunch of neurons; there are no convincing ideas on how consciousness arises from that mass of neurons.
The brain also has a bunch of chemicals that affect neural processing; there are no convincing ideas on how that gets you consciousness either.

We modeled perceptrons after neurons and we've been working to make them more like neurons. They don't have any obvious capabilities that perceptrons don't have.

That's the big problem with any claim that "AI doesn't do X like a person"; since we don't know how people do it we can neither verify nor refute that claim.

There's more to AI than just being non-deterministic. Anything that's too deterministic definitely isn't an intelligence though; natural or artificial. Video compression algorithms are definitely very far removed from AI.
H 1 Antwort Letzte Antwort

0
A axel7fb5@lemmy.cafe

why do you even jailbreak your kindle? you can still read pirated books on them if you connect it to your pc using calibre
J This user is from outside of this forum
J This user is from outside of this forum
j0ester@lemmy.world

schrieb am zuletzt editiert von j0ester@lemmy.world

#218

Hehe jailbreak an Android OS. You mean “rooting”.
1 Antwort Letzte Antwort

0
P pro@programming.dev

This post did not contain any content.
F This user is from outside of this forum
F This user is from outside of this forum
fizz@lemmy.nz

schrieb am zuletzt editiert von

#219

Judge,I'm pirating them to train ai not to consume for my own personal use.
1 Antwort Letzte Antwort

20
D dfx4509b_2@lemmy.org

Good luck breaking down people's doors for scanning their own physical books for their personal use when analog media has no DRM and can't phone home, and paper books are an analog medium.

That would be like kicking down people's doors for needle-dropping their LPs to FLAC for their own use and to preserve the physical records as vinyl wears down every time it's played back.
B This user is from outside of this forum
B This user is from outside of this forum
booly@sh.itjust.works

schrieb am zuletzt editiert von

#220

The ruling explicitly says that scanning books and keeping/using those digital copies is legal.

The piracy found to be illegal was downloading unauthorized copies of books from the internet for free.
D 1 Antwort Letzte Antwort

1
D dfx4509b_2@lemmy.org

Good luck breaking down people's doors for scanning their own physical books for their personal use when analog media has no DRM and can't phone home, and paper books are an analog medium.

That would be like kicking down people's doors for needle-dropping their LPs to FLAC for their own use and to preserve the physical records as vinyl wears down every time it's played back.
B This user is from outside of this forum
B This user is from outside of this forum
bob_robertson_ix@discuss.tchncs.de

schrieb am zuletzt editiert von

#221

It sounds like transferring an owned print book to digital and using it to train AI was deemed permissable. But downloading a book from the Internet and using it was training data is not allowed, even if you later purchase the pirated book. So, no one will be knocking down your door for scanning your books.

This does raise an interesting case where libraries could end up training and distributing public domain AI models.
R 1 Antwort Letzte Antwort

4
J jcbazpx@lemmy.world

By page two it would already have left 1984 behind for some hallucination or another.
P This user is from outside of this forum
P This user is from outside of this forum
pattymcb@lemmy.world

schrieb am zuletzt editiert von

#222

Oh, so it would be the news?
1 Antwort Letzte Antwort

0
B booly@sh.itjust.works

The ruling explicitly says that scanning books and keeping/using those digital copies is legal.

The piracy found to be illegal was downloading unauthorized copies of books from the internet for free.
D This user is from outside of this forum
D This user is from outside of this forum
deltapi@lemmy.world

schrieb am zuletzt editiert von

#223

I wonder if the archive.org cases had any bearing on the decision.
B 1 Antwort Letzte Antwort

0
N nodiratime@lemmy.world

Does it "generate" a 1:1 copy?
M This user is from outside of this forum
M This user is from outside of this forum
mtk@lemmy.world

schrieb am zuletzt editiert von

#224

You can train an LLM to generate 1:1 copies
1 Antwort Letzte Antwort

1
A axel7fb5@lemmy.cafe

why do you even jailbreak your kindle? you can still read pirated books on them if you connect it to your pc using calibre
Y This user is from outside of this forum
Y This user is from outside of this forum
yournamehere@lemm.ee

schrieb am zuletzt editiert von

#225

when not in use i have it load images from my local webserver that are generated by some scripts and feature local news or the weather. kindle screensaver sucks.
1 Antwort Letzte Antwort

0
N nednobbins@lemmy.zip

You may be correct but we don't really know how humans learn.

There's a ton of research on it and a lot of theories but no clear answers.
There's general agreement that the brain is a bunch of neurons; there are no convincing ideas on how consciousness arises from that mass of neurons.
The brain also has a bunch of chemicals that affect neural processing; there are no convincing ideas on how that gets you consciousness either.

We modeled perceptrons after neurons and we've been working to make them more like neurons. They don't have any obvious capabilities that perceptrons don't have.

That's the big problem with any claim that "AI doesn't do X like a person"; since we don't know how people do it we can neither verify nor refute that claim.

There's more to AI than just being non-deterministic. Anything that's too deterministic definitely isn't an intelligence though; natural or artificial. Video compression algorithms are definitely very far removed from AI.
H This user is from outside of this forum
H This user is from outside of this forum
hoppolito@mander.xyz

schrieb am zuletzt editiert von

#226

One point I would refute here is determinism. AI models are, by default, deterministic. They are made from deterministic parts and "any combination of deterministic components will result in a deterministic system". Randomness has to be externally injected into e.g. current LLMs to produce 'non-deterministic' output.

There is the notable exception of newer models like ChatGPT4 which seemingly produces non-deterministic outputs (i.e. give it the same sentence and it produces different outputs even with its temperature set to 0) - but my understanding is this is due to floating point number inaccuracies which lead to different token selection and thus a function of our current processor architectures and not inherent in the model itself.
N 1 Antwort Letzte Antwort

0
A antonim@lemmy.dbzer0.com

Facebook (Meta) torrented TBs from Libgen, and their internal chats leaked so we know about that, and IIRC they've been sued. Maybe you're thinking of that case?
S This user is from outside of this forum
S This user is from outside of this forum
scoffinglizard@lemmy.dbzer0.com

schrieb am zuletzt editiert von

#227

Billions of dollars, and they can't afford to buy ebooks?
1 Antwort Letzte Antwort

2
P pro@programming.dev

This post did not contain any content.
B This user is from outside of this forum
B This user is from outside of this forum
booly@sh.itjust.works

schrieb am zuletzt editiert von

#228
It took me a few days to get the time to read the actual court ruling but here's the basics of what it ruled (and what it didn't rule on):
- It's legal to scan physical books you already own and keep a digital library of those scanned books, even if the copyright holder didn't give permission. And even if you bought the books used, for very cheap, in bulk.
- It's legal to keep all the book data in an internal database for use within the company, as a central library of works accessible only within the company.
- It's legal to prepare those digital copies for potential use as training material for LLMs, including recognizing the text, performing cleanup on scanning/recognition errors, categorizing and cataloguing them to make editorial decisions on which works to include in which training sets, tokenizing them for the actual LLM technology, etc. This remains legal even for the copies that are excluded from training for whatever reason, as the entire bulk process may involve text that ends up not being used, but the process itself is fair use.
- It's legal to use that book text to create large language models that power services that are commercially sold to the public, as long as there are safeguards that prevent the LLMs from publishing large portions of a single copyrighted work without the copyright holder's permission.
- It's illegal to download unauthorized copies of copyrighted books from the internet, without the copyright holder's permission.
Here's what it didn't rule on:
- Is it legal to distribute large chunks of copyrighted text through one of these LLMs, such as when a user asks a chatbot to recite an entire copyrighted work that is in its training set? (The opinion suggests that it probably isn't legal, and relies heavily on the dividing line of how Google Books does it, by scanning and analyzing an entire copyrighted work but blocking users from retrieving more than a few snippets from those works).
- Is it legal to give anyone outside the company access to the digitized central library assembled by the company from printed copies?
- Is it legal to crawl publicly available digital data to build a library from text already digitized by someone else? (The answer may matter depending on whether there is an authorized method for obtaining that data, or whether the copyright holder refuses to license that copying).
So it's a pretty important ruling, in my opinion. It's a clear green light to the idea of digitizing and archiving copyrighted works without the copyright holder's permission, as long as you first own a legal copy in the first place. And it's a green light to using copyrighted works for training AI models, as long as you compiled that database of copyrighted works in a legal way.
1 Antwort Letzte Antwort

13
M mtk@lemmy.world

Check out my new site TheAIBay, you search for content and an LLM that was trained on reproducing it gives it to you, a small hash check is used to validate accuracy. It is now legal.
B This user is from outside of this forum
B This user is from outside of this forum
booly@sh.itjust.works

schrieb am zuletzt editiert von

#229

The court's ruling explicitly depended on the fact that Anthropic does not allow users to retrieve significant chunks of copyrighted text. It used the entire copyrighted work to train the weights of the LLMs, but is configured not to actually copy those works out to the public user. The ruling says that if the copyright holders later develop evidence that it is possible to retrieve entire copyrighted works, or significant portions of a work, then they will have the right sue over those facts.

But the facts before the court were that Anthropic's LLMs have safeguards against distributing copies of identifiable copyrighted works to its users.
1 Antwort Letzte Antwort

3
D devils_advocate@sh.itjust.works

Does buying the book give you license to digitise it?

Does owning a digital copy of the book give you license to convert it into another format and copy it into a database?

Definitions of "Ownership" can be very different.
B This user is from outside of this forum
B This user is from outside of this forum
booly@sh.itjust.works

schrieb am zuletzt editiert von

#230

Does buying the book give you license to digitise it?

Does owning a digital copy of the book give you license to convert it into another format and copy it into a database?

Yes. That's what the court ruled here. If you legally obtain a printed copy of a book you are free to digitize it or archive it for yourself. And you're allowed to keep that digital copy, analyze and index it and search it, in your personal library.

Anthropic's practice of buying physical books, removing the bindings, scanning the pages, and digitizing the content while destroying the physical book was found to be legal, so long as Anthropic didn't distribute that library outside of its own company.
1 Antwort Letzte Antwort

1
F forkdestroyer@infosec.pub

Make an AI that is trained on the books.

Tell it to tell you a story for one of the books.

Read the story without paying for it.

The law says this is ok now, right?
B This user is from outside of this forum
B This user is from outside of this forum
booly@sh.itjust.works

schrieb am zuletzt editiert von

#231

The law says this is ok now, right?

No.

The judge accepted the fact that Anthropic prevents users from obtaining the underlying copyrighted text through interaction with its LLM, and that there are safeguards in the software that prevent a user from being able to get an entire copyrighted work out of that LLM. It discusses the Google Books arrangement, where the books are scanned in the entirety, but where a user searching in Google Books can't actually retrieve more than a few snippets from any given book.

Anthropic get to keep the copy of the entire book. It doesn't get to transmit the contents of that book to someone else, even through the LLM service.

The judge also explicitly stated that if the authors can put together evidence that it is possible for a user to retrieve their entire copyrighted work out of the LLM, they'd have a different case and could sue over it at that time.
1 Antwort Letzte Antwort

0
R rvtv95xbeo@sh.itjust.works

But if one person buys a book, trains an "AI model" to recite it, then distributes that model we good?
B This user is from outside of this forum
B This user is from outside of this forum
booly@sh.itjust.works

schrieb am zuletzt editiert von

#232

No. The court made its ruling with the explicit understanding that the software was configured not to recite more than a few snippets from any copyrighted work, and would never produce an entire copyrighted work (or even a significant portion of a copyrighted work) in its output.

And the judge specifically reserved that question, saying if the authors could develop evidence that it was possible for a user to retrieve significant copyrighted material out of the LLM, they'd have a different case and would be able to sue under those facts.
1 Antwort Letzte Antwort

0
R randomgal@lemmy.ca

You're poor? Fuck you you have to pay to breathe.

Millionaire? Whatever you want daddy uwu
E This user is from outside of this forum
E This user is from outside of this forum
eestileib@lemmy.blahaj.zone

schrieb am zuletzt editiert von

#233

That's kind of how I read it too.

But as a side effect it means you're still allowed to photograph your own books at home as a private citizen if you own them.

Prepare to never legally own another piece of media in your life.
1 Antwort Letzte Antwort

2
F facedeer@fedia.io

Yes, and that part of the case is going to trial. This was a preliminary judgment specifically about the training itself.
B This user is from outside of this forum
B This user is from outside of this forum
booly@sh.itjust.works

schrieb am zuletzt editiert von

#234

specifically about the training itself.

It's two issues being ruled on.

Yes, as you mention, the act of training an LLM was ruled to be fair use, assuming that the digital training data was legally obtained.

The other part of the ruling, which I think is really, really important for everyone, not just AI/LLM companies or developers, is that it is legal to buy printed books and digitize them into a central library with indexed metadata. Anthropic has to go to trial on the pirated books they just downloaded from the internet, but has fully won the portion of the case about the physical books they bought and digitized.
1 Antwort Letzte Antwort

0
A alphane_moon@lemmy.world

I am not a lawyer. I am talking about reality.

What does an LLM application (or training processes associated with an LLM application) have to do with the concept of learning? Where is the learning happening? Who is doing the learning?

Who is stopping the individuals at the LLM company from learning or analysing a given book?

From my experience living in the US, this is pretty standard American-style corruption. Lots of pomp and bombast and roleplay of sorts, but the outcome is no different from any other country that is in deep need of judicial and anti-corruotion reform.
B This user is from outside of this forum
B This user is from outside of this forum
booly@sh.itjust.works

schrieb am zuletzt editiert von

#235

What does an LLM application (or training processes associated with an LLM application) have to do with the concept of learning?

No, you're framing the issue incorrectly.

The law concerns itself with copying. When humans learn, they inevitably copy things. They may memorize portions of copyrighted material, and then retrieve those memories in doing something new with them, or just by recreating it.

If the argument is that the mere act of copying for training an LLM is illegal copying, then what would we say about the use of copyrighted text for teaching children? They will memorize portions of what they read. They will later write some of them down. And if there is a person who memorizes an entire poem (or song) and then writes it down for someone else, that's actually a copyright violation. But if they memorize that poem or song and reuse it in creating something new and different, but with links and connections to that previous copyrighted work, then that kind of copying and processing is generally allowed.

The judge here is analyzing what exact types of copying are permitted under the law, and for that, the copyright holders' argument would sweep too broadly and prohibit all sorts of methods that humans use to learn.
1 Antwort Letzte Antwort

0
P prox@lemmy.world

FTA:

Anthropic warned against “[t]he prospect of ruinous statutory damages—$150,000 times 5 million books”: that would mean $750 billion.

So part of their argument is actually that they stole so much that it would be impossible for them/anyone to pay restitution, therefore we should just let them off the hook.
W This user is from outside of this forum
W This user is from outside of this forum
womble@lemmy.world

schrieb am zuletzt editiert von

#236

The problem isnt anthropic get to use that defense, its that others dont. The fact the the world is in a place where people can be fined 5+ years of a western European average salary for making a copy of one (1) book that does not materially effect the copyright holder in any way is insane and it is good to point that out no matter who does it.
1 Antwort Letzte Antwort

1

Anmelden zum Antworten

U

Getting Started with Cloudflare Tunnel
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

10 Stimmen

1 Beiträge

4 Aufrufe

Niemand hat geantwortet
P

The Johns Hopkins University Press will license its authors' books to train AI models, citing concerns that “the window may be closing” for making AI deals
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

1

31 Stimmen

3 Beiträge

19 Aufrufe

L

Oh look, the live service FOMO tactic is leaking out. First they fail to protect the children, now they fail to protect themselves.
P

60% of Teachers Used AI This Year and Saved up to 6 Hours of Work a Week
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
28

1

41 Stimmen

28 Beiträge

394 Aufrufe

T

The poll, published by the research firm and the Walton Family Foundation... Walton Family Foundation provides financial support to The 74. What kind of fool would believe anything from these grifters? Phony AF at its face.
S

Your Go-To Tool for FB Video & Reels Downloading
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

15 Aufrufe

Niemand hat geantwortet
A

Remote MCP servers for VSCode
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

19 Aufrufe

Niemand hat geantwortet
D

Dyson Has Killed Its Bizarre Zone Air-Purifying Headphones
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
45

1

226 Stimmen

45 Beiträge

577 Aufrufe

R

I have been chuckling like a dork at this particular patent since such things first became searchable online, and have never found any evidence of it being manufactured and marketed at all. The "non-adhesive adherence" is illustrated in the diagrams on the patent which you can see at the link. The inventor proposes "a facing of fluffy fibrous material" to provide the filtration and the adherence; basically this thing is the softer side of a velcro strip, bent in half with the fluff facing outward so it sticks to the inside of your buttcrack to hold itself in place in front of your anus and filter your farts through it.
P

Silicon Valley cities hit with request for residents' emails to train AI
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
12

1

124 Stimmen

12 Beiträge

117 Aufrufe

T

Premium supported. You get plenty with the free tier, but you get lots more with paid.
P

Disappointed in Plebbit : I Really Believed in the Vision, But It Was All Just Talk
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
6

1

13 Stimmen

6 Beiträge

67 Aufrufe

R

Protocol implementation plebbit-js is separated from client like Seedit