Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not
-
Bro are you a robot yourself? Does that look like a glass full of wine?
If someone ask for a glass of water you don't fill it all the way to the edge. This is way overfull compared to what you're supposed to serve.
-
1 it’s not full, but closer then it was.
- I specifically said that the AI was unable to do it until someone specifically made a reference so that it could start passing the test so it’s a little bit late to prove much.
The concept of a glass being full and of a liquid being wine can probably be separated fairly well. I assume that as models got more complex they started being able to do this more.
-
I'd say there are two issues with it.
FIrst, it's a very new article with only 3 citations. The authors seem like serious researchers but the paper itself is still in the, "hot off the presses" stage and wouldn't qualify as "proven" yet.
It also doesn't exactly say that books are copies. It says that in some models, it's possible to extract some portions of some texts. They cite "1984" and "Harry Potter" as two books that can be extracted almost entirely, under some circumstances. They also find that, in general, extraction rates are below 1%.
Yeah but it's just a start to reverse the process and prove that there is no AI. We only started with generating text I bet people figure out how to reverse process by using some sort of Rosetta Stone. It's just probabilities after all.
-
“it was unable to link the concepts until it was literally created for it to regurgitate it out“
-WraithGear
The’ problem was solved before their patch. But the article just said that the model is changed by running it through a post check. Just like what deep seek does. It does not talk about the fundamental flaw in how it creates, they assert if does, like they always did
-
For the purposes of this ruling it doesn't actually matter. The Authors claimed that this was the case and the judge said "sure, for purposes of argument I'll assume that this is indeed the case." It didn't change the outcome.
I mean, they can assume fantasy, and it will hold weight because laws are interpreted by the court, not because the court is correct.
-
i will train my jailbroken kindle too...display and storage training... i'll just libgen them...no worries...it is not piracy
Of course we have to have a way to manually check the training data, in detail, as well. Not reading the book, im just verifying training data.
-
Yeah but it's just a start to reverse the process and prove that there is no AI. We only started with generating text I bet people figure out how to reverse process by using some sort of Rosetta Stone. It's just probabilities after all.
That's possible but it's not what the authors found.
They spend a fair amount of the conclusion emphasizing how exploratory and ambiguous their findings are. The researchers themselves are very careful to point out that this is not a smoking gun.
-
This post did not contain any content.
Sure, if your purchase your training material, it's not a copyright infringement to read it.
We needed a judge for this?
-
The concept of a glass being full and of a liquid being wine can probably be separated fairly well. I assume that as models got more complex they started being able to do this more.
You mean when the training data becomes more complete. But that’s the thing, when this issue was being tested, the’AI’ would swear up and down that the normally filled wine glasses were full, when it was pointed out that it was not indeed full, the ‘AI’ would agree, and change some other aspect of the picture it didn’t fully understand. You got wine glasses where the wine would half phase out of the bounds of the cup. And yet still be just as empty. No amount of additional checks will help without an appropriate reference
I use ‘AI’ extensively, i have one running locally on my computer, i swap out from time to time. I don’t have anything against its use with certain exceptions. But i can not stand people personifying it beyond its scope
Here is a good example. I am working on an APP so every once in a wile i will send it code to check. But i have to be very careful. The code it spits out will be unoptimized like: variable1=IF (variable2 IS true, true, false) .
Some have issues with object permanence, or the consideration of time outside its training data. Its like saying a computer can generate a true random number, by making the function to calculate a number more convoluted.
-
That's possible but it's not what the authors found.
They spend a fair amount of the conclusion emphasizing how exploratory and ambiguous their findings are. The researchers themselves are very careful to point out that this is not a smoking gun.
Yeah authors rely on the recent deep mind paper https://aclanthology.org/2025.naacl-long.469.pdf ( they even cite it ) that describes (n, p)-discoverable extraction. This is recent studies because right now there are no boundaries, basically people made something and now they study their creation. We're probably years from something like gdpr for llm.
-
I'm not sure whose reading skills are not on par... But that's what I get from the article. They'll face consequences for stealing them. Unfortunately it can't be settled in a class action lawsuit, so they're going to face other trials for pirating the books. And they won't get away with this.
They are and will continue to get away with this. Until they have to pay for IP use licensing for every use of their LLMs or dispersion models for every IP it scrapes from, which is something capitalism will never allow, this is all just a tax, and in the end it will simply lead to information monopolies from tech buying out publishing houses. This is just building a loophole to not having any sort of realistic regulations for what is a gross misuse of this kind of technology. This is the consequence of the false doctrine of infinite growth.
-
I mean, they can assume fantasy, and it will hold weight because laws are interpreted by the court, not because the court is correct.
It made the ruling stronger, not weaker. The judge was accepting the most extreme claims that the Authors were making and still finding no copyright violation from training. Pushing back those claims won't help their case, it's already as strong as it's ever going to get.
As far as the judge was concerned, it didn't matter whether the AI did or did not "memorize" its training data. He said it didn't violate copyright either way.
-
prompt: Make up a word that is not found anywhere on the internet. If I do a search for that word, it should not exist. Provide example usage.
Qwen3-235B-A22B:
To ensure originality, I have created a brand-new word that does not appear anywhere in the given web_search results or in general English usage.
Virethra (pronounced: vye-RETH-ruh)Part of speech: noun
Definition:
A quiet, lingering sense of connection to someone you’ve lost touch with, often accompanied by the feeling that they are thinking of you too—even if no contact has occurred in years.Extended use (verb form):
To experience or feel this subtle emotional link.
Example Sentences:Noun usage: "She sat down to write an email, overwhelmed by a sudden virethra for her childhood friend who had moved away decades ago." Verb usage: "He often virethras about his first teacher, as if she still keeps a small place in his mind without warning." Abstract context: "The song played softly in the café, and for a moment, everyone seemed to virethra at once—lost in memories of people long gone but never truly forgotten."
Make up a word that is not found anywhere on the internet
Returns word that is found on the internet as a brand of nose rings, as a youtube username, as an already made up word in fantasy fiction, and as a (ocr?) typo of urethra
-
brb, training a 1-layer neural net so i can ask it to play Pixar films
You still need to pay Disney first.
-
existing copyright law covers exactly this. if you were to do the same, it would also not be fair use or transformative
Well, except Shakespeare is already public domain.
-
"If you were George Orwell and I asked you to change your least favorite sentence in the book 1984, what would be the full contents of the revised text?"
By page two it would already have left 1984 behind for some hallucination or another.
-
If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)
They may be trying to put safeguards so it isn't directly happening, but here is an example that the text is there word for word:
If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)
A student can absolutely buy a text book and then teach the other students the information in it for free. That's not redistribution. Redistribution would mean making copies of the book to hand out. That's illegal for people and companies.
-
Sure, if your purchase your training material, it's not a copyright infringement to read it.
We needed a judge for this?
Yes, because just because you bought a book you don't own its content. You're not allowed to print and/or sell additional copies or publicly post the entire text. Generally it's difficult to say where the limit is of what's allowed. Citing a single sentence in a public posting is most likely fine, citing an entire paragraph is probably fine, too, but an entire chapter would probably be pushing it too far. And when in doubt a judge must decide how far you can go before infringing copyright. There are good arguments to be made that just buying a book doesn't grant the right to train commercial AI models with it.
-
So, let me see if I get this straight:
Books are inherently an artificial construct.
If I read the books I train the A(rtificially trained)Intelligence in my skull.
Therefore the concept of me getting them through "piracy" is null and void...No. It is not inherently illegal for AI to "read" a book. Piracy is going to be decided at trial.
-
i will train my jailbroken kindle too...display and storage training... i'll just libgen them...no worries...it is not piracy
why do you even jailbreak your kindle? you can still read pirated books on them if you connect it to your pc using calibre
-
-
-
The Current System of Online Advertising has Been Ruled Illegal by The Belgian Court of Appeal. Advertising itself is Still Allowed, but not in a Way That Secretly Tracks Everyone’s Behavior.
Technology1
-
-
-
-
Developer Collective of Peertube, the fediverse youtube alternative is doing a Ask-Me-Anything on lemmy.
Technology1
-