linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Technology

254 Beiträge 123 Kommentatoren 1.9k Aufrufe

P prox@lemmy.world

FTA:

Anthropic warned against “[t]he prospect of ruinous statutory damages—$150,000 times 5 million books”: that would mean $750 billion.

So part of their argument is actually that they stole so much that it would be impossible for them/anyone to pay restitution, therefore we should just let them off the hook.
I This user is from outside of this forum
I This user is from outside of this forum
interdimensionalmeme@lemmy.ml

schrieb zuletzt editiert von

#137

What is means is they don't own the models. They are the commons of humanity, they are merely temporary custodians. The nightnare ending is the elites keeping the most capable and competent models for themselves as private play things. That must not be allowed to happen under any circumstances. Sue openai, anthropic and the other enclosers, sue them for trying to take their ball and go home. Disposses them and sue the investors for their corrupt influence on research.
1 Antwort Letzte Antwort

0
B bjoern_tantau@swg-empire.de

And thus the singularity was born.
I This user is from outside of this forum
I This user is from outside of this forum
interdimensionalmeme@lemmy.ml

schrieb zuletzt editiert von

#138

Yes please a singularity of intellectual property that collapses the idea of ownong ideas. Of making the infinitely freely copyableinto a scarce ressource. What corrupt idiocy this has been. Landlords for ideas and look what garbage it has been producing.
1 Antwort Letzte Antwort

2
W wraithgear@lemmy.world

Yes, on the second part. Just rearranging or replacing words in a text is not transformative, which is a requirement. There is an argument that the ‘AI’ are capable of doing transformative work, but the tokenizing and weight process is not magic and in my use of multiple LLM’s they do not have an understanding of the material any more then a dictionary understands the material printed on its pages.

An example was the wine glass problem. Art ‘AI’s were unable to display a wine glass filled to the top. No matter how it was prompted, or what style it aped, it would fail to do so and report back that the glass was full. But it could render a full glass of water. It didn’t understand what a full glass was, not even for the water. How was this possible? Well there was very little art of a full wine glass, because society has an unspoken rule that a full wine glass is the epitome of gluttony, and it is to be savored not drunk. Where as the reference of full glasses of water were abundant. It doesn’t know what full means, just that pictures of full glass of water are tied to phrases full, glass, and water.
L This user is from outside of this forum
L This user is from outside of this forum
lifeinmultiplechoice@lemmy.world

schrieb zuletzt editiert von lifeinmultiplechoice@lemmy.world

#139

Yeah, we had a fun example a while ago, let me see if I can still find it.

We would ask to create a photo of a cat with no tail.

And then tell it there was indeed a tail, and ask it to draw an arrow to point to it.

It just points to where the tail most commonly is, or was said to be in a picture it was not referencing.

Edit: granted now, it shows a picture of a cat where you just can't see the tail in the picture.
1 Antwort Letzte Antwort

1
F freedomadvocate@lemmy.net.au

Makes sense. AI can “learn” from and “read” a book in the same way a person can and does, as long as it is acquired legally. AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

Some people just see “AI” and want everything about it outlawed basically. If you put some information out into the public, you don’t get to decide who does and doesn’t consume and learn from it. If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.
A This user is from outside of this forum
A This user is from outside of this forum
antonim@lemmy.dbzer0.com

schrieb zuletzt editiert von

#140

AI can “learn” from and “read” a book in the same way a person can and does,

If it's in the same way, then why do you need the quotation marks? Even you understand that they're not the same.

And either way, machine learning is different from human learning in so many ways it's ridiculous to even discuss the topic.

AI doesn’t reproduce a work that it “learns” from

That depends on the model and the amount of data it has been trained on. I remember the first public model of ChatGPT producing a sentence that was just one word different from what I found by googling the text (from some scientific article summary, so not a trivial sentence that could line up accidentally). More recently, there was a widely reported-on study of AI-generated poetry where the model was requested to produce a poem in the style of Chaucer, and then produced a letter-for-letter reproduction of the well-known opening of the Canterbury Tales. It hasn't been trained on enough Middle English poetry and thus can't generate any of it, so it defaulted to copying a text that probably occurred dozens of times in its training data.
1 Antwort Letzte Antwort

2
L loreleisanktheship@lemmy.ml

As long as they don't use exactly the same words in the book, yeah, as I understand it.
V This user is from outside of this forum
V This user is from outside of this forum
vane@lemmy.world

schrieb zuletzt editiert von vane@lemmy.world

#141

How they don't use same words as in the book ? That's not how LLM works. They use exactly same words if the probabilities align. It's proved by this study. https://arxiv.org/abs/2505.12546
S N 2 Antworten Letzte Antwort

1
A alsimoneau@lemmy.ca

Copilot did it just fine
A This user is from outside of this forum
A This user is from outside of this forum
antonim@lemmy.dbzer0.com

schrieb zuletzt editiert von

#142

Bro are you a robot yourself? Does that look like a glass full of wine?
A 1 Antwort Letzte Antwort

1
G gaja@lemm.ee

I am educated on this. When an ai learns, it takes an input through a series of functions and are joined at the output. The set of functions that produce the best output have their functions developed further. Individuals do not process information like that. With poor exploration and biasing, the output of an AI model could look identical to its input. It did not "learn" anymore than a downloaded video ran through a compression algorithm.
E This user is from outside of this forum
E This user is from outside of this forum
enkimaru@lemmy.world

schrieb zuletzt editiert von

#143

You are obviously not educated on this.

It did not “learn” anymore than a downloaded video ran through a compression algorithm.
Just: LoLz.
H G 2 Antworten Letzte Antwort

3
F forkdestroyer@infosec.pub

Make an AI that is trained on the books.

Tell it to tell you a story for one of the books.

Read the story without paying for it.

The law says this is ok now, right?
E This user is from outside of this forum
E This user is from outside of this forum
enkimaru@lemmy.world

schrieb zuletzt editiert von

#144

The LLM is not repeating the same book. The owner of the LLM has the exact same rights to do with what his LLM is reading, as you have to do with what ever YOU are reading.

As long as it is not a verbatim recitation, it is completely okay.

According to story telling theory: there are only roughly 15 different story types anyway.
1 Antwort Letzte Antwort

4
E enkimaru@lemmy.world

You are obviously not educated on this.

It did not “learn” anymore than a downloaded video ran through a compression algorithm.
Just: LoLz.
H This user is from outside of this forum
H This user is from outside of this forum
hoppolito@mander.xyz

schrieb zuletzt editiert von

#145

I am not sure what your contention, or gotcha, is with the comment above but they are quite correct. And additionally chose quite an apt example with video compression since in most ways current 'AI' effectively functions as a compression algorithm, just for our language corpora instead of video.
N 1 Antwort Letzte Antwort

3
L lovablesidekick@lemmy.world

Lawsuits are multifaceted. This statement isn't a a defense or an argument for innocence, it's just what it says - an assertion that the proposed damages are unreasonably high. If the court agrees, the plaintiff can always propose a lower damage claim that the court thinks is reasonable.
T This user is from outside of this forum
T This user is from outside of this forum
thistlewick@lemmynsfw.com

schrieb zuletzt editiert von

#146

You’re right, each of the 5 million books’ authors should agree to less payment for their work, to make the poor criminals feel better.

If I steal $100 from a thousand people and spend it all on hookers and blow, do I get out of paying that back because I don’t have the funds? Should the victims agree to get $20 back instead because that’s more within my budget?
L W 2 Antworten Letzte Antwort

7
D devils_advocate@sh.itjust.works

Does buying the book give you license to digitise it?

Does owning a digital copy of the book give you license to convert it into another format and copy it into a database?

Definitions of "Ownership" can be very different.
V This user is from outside of this forum
V This user is from outside of this forum
voterfrog@lemmy.world

schrieb zuletzt editiert von voterfrog@lemmy.world

#147

It seems like a lot of people misunderstand copyright so let's be clear: the answer is yes. You can absolutely digitize your books. You can rip your movies and store them on a home server and run them through compression algorithms.

Copyright exists to prevent others from redistributing your work so as long as you're doing all of that for personal use, the copyright owner has no say over what you do with it.

You even have some degree of latitude to create and distribute transformative works with a violation only occurring when you distribute something pretty damn close to a copy of the original. Some perfectly legal examples: create a word cloud of a book, analyze the tone of news article to help you trade stocks, produce an image containing the most prominent color in every frame of a movie, or create a search index of the words found on all websites on the internet.

You can absolutely do the same kinds of things an AI does with a work as a human.
1 Antwort Letzte Antwort

11
B bytesonbike@discuss.online

That's legal just don't look at them or enjoy them.
A This user is from outside of this forum
A This user is from outside of this forum
antonim@lemmy.dbzer0.com

schrieb zuletzt editiert von

#148

Yeah, I don't think that would fly.

"Your honour, I was just hoarding that terabyte of Hollywood films, I haven't actually watched them."
R 1 Antwort Letzte Antwort

0
V vane@lemmy.world

How they don't use same words as in the book ? That's not how LLM works. They use exactly same words if the probabilities align. It's proved by this study. https://arxiv.org/abs/2505.12546
S This user is from outside of this forum
S This user is from outside of this forum
sufferingsteve@feddit.nu

schrieb zuletzt editiert von

#149

The "if" is working overtime in your statement
1 Antwort Letzte Antwort

4
S snekerpimp@lemmy.snekerpimp.space

“I torrented all this music and movies to train my local ai models”
V This user is from outside of this forum
V This user is from outside of this forum
vane@lemmy.world

schrieb zuletzt editiert von vane@lemmy.world

#150

This is not pirated music. It's AI generated. The fact that it sounds and is named the same is just coincidence.
1 Antwort Letzte Antwort

0
D deathsembrace@lemmy.world

So I can't use any of these works because it's plagiarism but AI can?
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemmy.zip

schrieb zuletzt editiert von

#151

That's not what it says.

Neither you nor an AI is allowed to take a book without authorization; that includes downloading and stealing it. That has nothing to do with plagiarism; it's just theft.

Assuming that the book has been legally obtained, both you and an AI are allowed to read that book, learn from it, and use the knowledge you obtained.

Both you and the AI need to follow existing copyright laws and licensing when it comes to redistributing that work.

"Plagiarism" is the act of claiming someone else's work as your own and it's orthogonal to the use of AI. If you ask either a human or an AI to produce an essay on the philosophy surrounding suicide, you're fairly likely to include some Shakespeare quotes. It's only plagiarism if you or the AI fail to provide attribution.
1 Antwort Letzte Antwort

9
E enkimaru@lemmy.world

You can digitize the books you own. You do not need a license for that. And of course you could put that digital format into a database. As databases are explicit exceptions from copyright law. If you want to go to the extreme: delete first copy. Then you have only in the database. However: AIs/LLMs are not based on data bases. But on neural networks. The original data gets lost when "learned".
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemmy.zip

schrieb zuletzt editiert von

#152

If you want to go to the extreme: delete first copy.

You can; as I understand it, the only legal requirement is that you only use one copy at a time.

ie. I can give my book to a friend after I'm done reading it; I can make a copy of a book and keep them at home and at the office and switch off between reading them; I'm not allowed to make a copy of the book hand one to a friend and then both of us read it at the same time.
G 1 Antwort Letzte Antwort

3
H hoppolito@mander.xyz

I am not sure what your contention, or gotcha, is with the comment above but they are quite correct. And additionally chose quite an apt example with video compression since in most ways current 'AI' effectively functions as a compression algorithm, just for our language corpora instead of video.
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemmy.zip

schrieb zuletzt editiert von

#153

They seem pretty different to me.

Video compression developers go through a lot of effort to make them deterministic. We don't necessarily care that a particular video stream compresses to a particular bit sequence but we very much care that the resulting decompression gets you as close to the original as possible.

AIs will rarely produce exact replicas of anything. They synthesize outputs from heterogeneous training data. That sounds like learning to me.

The one area where there's some similarity is dimensionality reduction. Its technically a form of compression, since it makes your files smaller. It would also be an extremely expensive way to get extremely bad compression. It would take orders of magnitude more hardware resources and the images are likely to be unrecognizable.
G 1 Antwort Letzte Antwort

4
F facedeer@fedia.io

The judge isn't saying that they learn or that they're people. He's saying that training falls into the same legal classification as learning.
D This user is from outside of this forum
D This user is from outside of this forum
dojan@pawb.social

schrieb zuletzt editiert von

#154

Which doesn’t make any sense.
1 Antwort Letzte Antwort

0
D deathsembrace@lemmy.world

It cant speak or use any words without it being someone elses words it learned from? Unless its giving sources everything is always from something it learned because it cannot speak or use words without that source in the first place?
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemmy.zip

schrieb zuletzt editiert von

#155
prompt: Make up a word that is not found anywhere on the internet. If I do a search for that word, it should not exist. Provide example usage.

Qwen3-235B-A22B:

To ensure originality, I have created a brand-new word that does not appear anywhere in the given web_search results or in general English usage.
Virethra (pronounced: vye-RETH-ruh)

Part of speech: noun

Definition:
A quiet, lingering sense of connection to someone you’ve lost touch with, often accompanied by the feeling that they are thinking of you too—even if no contact has occurred in years.

Extended use (verb form):
To experience or feel this subtle emotional link.
Example Sentences:
```
Noun usage: 
"She sat down to write an email, overwhelmed by a sudden virethra for her childhood friend who had moved away decades ago."  

Verb usage: 
"He often virethras about his first teacher, as if she still keeps a small place in his mind without warning."  

Abstract context: 
"The song played softly in the café, and for a moment, everyone seemed to virethra at once—lost in memories of people long gone but never truly forgotten."  
```
K 1 Antwort Letzte Antwort

1
V vane@lemmy.world

Ok so you can buy books scan them or ebooks and use for AI training but you can't just download priated books from internet to train AI. Did I understood that correctly ?
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemmy.zip

schrieb zuletzt editiert von

#156

That's my understanding too. If you obtained them legally, you can use them the same way anyone else who obtained them legally could use them.
1 Antwort Letzte Antwort

3

Anmelden zum Antworten

T

Software is evolving backwards
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
64

1

341 Stimmen

64 Beiträge

405 Aufrufe

M

Came here looking for this
D

The Guardian and Cambridge University's Department of Computer Science unveil new secure technology to protect sources
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
64

1

327 Stimmen

64 Beiträge

402 Aufrufe

B

I get that, but it's more logical to me that of I'm going to whistleblow on a company to not use one of their devices to do it. That way it doesn't matter what apps are or are not secure, you're not using their device that can potentially track you.
R

[DHS]Threat to the United States. "Both hacktivists and Iranian government-affiliated actors routinely target poorly secured US networks and Internet-connected devices for disruptive cyber attacks."
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
22

103 Stimmen

22 Beiträge

118 Aufrufe

T

Nobody is ignoring these imperial invasions, genocides, etc. USA is actively supporting them.
P

Google’s Advanced Protection Arrives on Android: Should You Use It?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
17

1

61 Stimmen

17 Beiträge

85 Aufrufe

A

I’ll probably never trust anything they’ve touched until I’ve taken it apart and put it back together again. Me too. But the vast majority of users need guardrails, and have a different threat model. Even those that also care about privacy, if they just want a solution that comes by default, this adtech 'fake' or 'superficial' solution does provide something. And anything is more than nothing.
P

Adaptive Keyboards & Writing Technologies For One-Handed Users
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
5

1

111 Stimmen

5 Beiträge

37 Aufrufe

T

Came here to say this.
P

You are Already On "The List"
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

47 Stimmen

2 Beiträge

22 Aufrufe

M

Even if they're wrong. It's too late. You're already on the list. .... The only option is to destroy the list and those who will use it
P

Paradromics implanted and removed its Connexus brain implant in a patient during epilepsy surgery, a first for the Neuralink rival
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

6 Stimmen

1 Beiträge

12 Aufrufe

Niemand hat geantwortet
P

[UK] Live facial recognition cameras may become ‘commonplace’ as police use soars
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
12

1

36 Stimmen

12 Beiträge

62 Aufrufe

C

Definitely don't want to be painting my face every day