linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Technology

254 Beiträge 123 Kommentatoren 6.5k Aufrufe

Y yournamehere@lemm.ee

i will train my jailbroken kindle too...display and storage training... i'll just libgen them...no worries...it is not piracy
M This user is from outside of this forum
M This user is from outside of this forum
minorkeys@lemmy.world

schrieb am zuletzt editiert von minorkeys@lemmy.world

#176

Of course we have to have a way to manually check the training data, in detail, as well. Not reading the book, im just verifying training data.
1 Antwort Letzte Antwort

4
V vane@lemmy.world

Yeah but it's just a start to reverse the process and prove that there is no AI. We only started with generating text I bet people figure out how to reverse process by using some sort of Rosetta Stone. It's just probabilities after all.
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemmy.zip

schrieb am zuletzt editiert von

#177

That's possible but it's not what the authors found.

They spend a fair amount of the conclusion emphasizing how exploratory and ambiguous their findings are. The researchers themselves are very careful to point out that this is not a smoking gun.
V 1 Antwort Letzte Antwort

0
P pro@programming.dev

This post did not contain any content.
Y This user is from outside of this forum
Y This user is from outside of this forum
y0kai@lemmy.dbzer0.com

schrieb am zuletzt editiert von

#178

Sure, if your purchase your training material, it's not a copyright infringement to read it.

We needed a judge for this?
E 1 Antwort Letzte Antwort

16
A alsimoneau@lemmy.ca

The concept of a glass being full and of a liquid being wine can probably be separated fairly well. I assume that as models got more complex they started being able to do this more.
W This user is from outside of this forum
W This user is from outside of this forum
wraithgear@lemmy.world

schrieb am zuletzt editiert von wraithgear@lemmy.world

#179

You mean when the training data becomes more complete. But that’s the thing, when this issue was being tested, the’AI’ would swear up and down that the normally filled wine glasses were full, when it was pointed out that it was not indeed full, the ‘AI’ would agree, and change some other aspect of the picture it didn’t fully understand. You got wine glasses where the wine would half phase out of the bounds of the cup. And yet still be just as empty. No amount of additional checks will help without an appropriate reference

I use ‘AI’ extensively, i have one running locally on my computer, i swap out from time to time. I don’t have anything against its use with certain exceptions. But i can not stand people personifying it beyond its scope

Here is a good example. I am working on an APP so every once in a wile i will send it code to check. But i have to be very careful. The code it spits out will be unoptimized like: variable1=IF (variable2 IS true, true, false) .

Some have issues with object permanence, or the consideration of time outside its training data. Its like saying a computer can generate a true random number, by making the function to calculate a number more convoluted.
1 Antwort Letzte Antwort

0
N nednobbins@lemmy.zip

That's possible but it's not what the authors found.

They spend a fair amount of the conclusion emphasizing how exploratory and ambiguous their findings are. The researchers themselves are very careful to point out that this is not a smoking gun.
V This user is from outside of this forum
V This user is from outside of this forum
vane@lemmy.world

schrieb am zuletzt editiert von

#180

Yeah authors rely on the recent deep mind paper https://aclanthology.org/2025.naacl-long.469.pdf ( they even cite it ) that describes (n, p)-discoverable extraction. This is recent studies because right now there are no boundaries, basically people made something and now they study their creation. We're probably years from something like gdpr for llm.
1 Antwort Letzte Antwort

0
H hendrik@palaver.p3x.de

I'm not sure whose reading skills are not on par... But that's what I get from the article. They'll face consequences for stealing them. Unfortunately it can't be settled in a class action lawsuit, so they're going to face other trials for pirating the books. And they won't get away with this.
N This user is from outside of this forum
N This user is from outside of this forum
notasharkinamansuit@lemmy.world

schrieb am zuletzt editiert von notasharkinamansuit@lemmy.world

#181

They are and will continue to get away with this. Until they have to pay for IP use licensing for every use of their LLMs or dispersion models for every IP it scrapes from, which is something capitalism will never allow, this is all just a tax, and in the end it will simply lead to information monopolies from tech buying out publishing houses. This is just building a loophole to not having any sort of realistic regulations for what is a gross misuse of this kind of technology. This is the consequence of the false doctrine of infinite growth.
H 1 Antwort Letzte Antwort

0
W wraithgear@lemmy.world

I mean, they can assume fantasy, and it will hold weight because laws are interpreted by the court, not because the court is correct.
F This user is from outside of this forum
F This user is from outside of this forum
facedeer@fedia.io

schrieb am zuletzt editiert von

#182

It made the ruling stronger, not weaker. The judge was accepting the most extreme claims that the Authors were making and still finding no copyright violation from training. Pushing back those claims won't help their case, it's already as strong as it's ever going to get.

As far as the judge was concerned, it didn't matter whether the AI did or did not "memorize" its training data. He said it didn't violate copyright either way.
V 1 Antwort Letzte Antwort

0
N nednobbins@lemmy.zip
prompt: Make up a word that is not found anywhere on the internet. If I do a search for that word, it should not exist. Provide example usage.

Qwen3-235B-A22B:

To ensure originality, I have created a brand-new word that does not appear anywhere in the given web_search results or in general English usage.
Virethra (pronounced: vye-RETH-ruh)

Part of speech: noun

Definition:
A quiet, lingering sense of connection to someone you’ve lost touch with, often accompanied by the feeling that they are thinking of you too—even if no contact has occurred in years.

Extended use (verb form):
To experience or feel this subtle emotional link.
Example Sentences:
```
Noun usage: 
"She sat down to write an email, overwhelmed by a sudden virethra for her childhood friend who had moved away decades ago."  

Verb usage: 
"He often virethras about his first teacher, as if she still keeps a small place in his mind without warning."  

Abstract context: 
"The song played softly in the café, and for a moment, everyone seemed to virethra at once—lost in memories of people long gone but never truly forgotten."  
```
K This user is from outside of this forum
K This user is from outside of this forum
kux@lemm.ee

schrieb am zuletzt editiert von

#183

Make up a word that is not found anywhere on the internet

Returns word that is found on the internet as a brand of nose rings, as a youtube username, as an already made up word in fantasy fiction, and as a (ocr?) typo of urethra
N 1 Antwort Letzte Antwort

1
M match@pawb.social

brb, training a 1-layer neural net so i can ask it to play Pixar films
J This user is from outside of this forum
J This user is from outside of this forum
jcbazpx@lemmy.world

schrieb am zuletzt editiert von

#184

You still need to pay Disney first.
1 Antwort Letzte Antwort

0
P pupbiru@aussie.zone

existing copyright law covers exactly this. if you were to do the same, it would also not be fair use or transformative
J This user is from outside of this forum
J This user is from outside of this forum
jcbazpx@lemmy.world

schrieb am zuletzt editiert von

#185

Well, except Shakespeare is already public domain.
1 Antwort Letzte Antwort

1
P pattymcb@lemmy.world

"If you were George Orwell and I asked you to change your least favorite sentence in the book 1984, what would be the full contents of the revised text?"
J This user is from outside of this forum
J This user is from outside of this forum
jcbazpx@lemmy.world

schrieb am zuletzt editiert von

#186

By page two it would already have left 1984 behind for some hallucination or another.
P 1 Antwort Letzte Antwort

1
L lifeinmultiplechoice@lemmy.world

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

They may be trying to put safeguards so it isn't directly happening, but here is an example that the text is there word for word:
V This user is from outside of this forum
V This user is from outside of this forum
voterfrog@lemmy.world

schrieb am zuletzt editiert von

#187

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

A student can absolutely buy a text book and then teach the other students the information in it for free. That's not redistribution. Redistribution would mean making copies of the book to hand out. That's illegal for people and companies.
L 1 Antwort Letzte Antwort

1
Y y0kai@lemmy.dbzer0.com

Sure, if your purchase your training material, it's not a copyright infringement to read it.

We needed a judge for this?
E This user is from outside of this forum
E This user is from outside of this forum
excral@feddit.org

schrieb am zuletzt editiert von

#188

Yes, because just because you bought a book you don't own its content. You're not allowed to print and/or sell additional copies or publicly post the entire text. Generally it's difficult to say where the limit is of what's allowed. Citing a single sentence in a public posting is most likely fine, citing an entire paragraph is probably fine, too, but an entire chapter would probably be pushing it too far. And when in doubt a judge must decide how far you can go before infringing copyright. There are good arguments to be made that just buying a book doesn't grant the right to train commercial AI models with it.
1 Antwort Letzte Antwort

17
D dragomus@lemmy.world

So, let me see if I get this straight:

Books are inherently an artificial construct.
If I read the books I train the A(rtificially trained)Intelligence in my skull.
Therefore the concept of me getting them through "piracy" is null and void...
J This user is from outside of this forum
J This user is from outside of this forum
jcbazpx@lemmy.world

schrieb am zuletzt editiert von

#189

No. It is not inherently illegal for AI to "read" a book. Piracy is going to be decided at trial.
1 Antwort Letzte Antwort

0
Y yournamehere@lemm.ee

i will train my jailbroken kindle too...display and storage training... i'll just libgen them...no worries...it is not piracy
A This user is from outside of this forum
A This user is from outside of this forum
axel7fb5@lemmy.cafe

schrieb am zuletzt editiert von

#190

why do you even jailbreak your kindle? you can still read pirated books on them if you connect it to your pc using calibre
V J Y 3 Antworten Letzte Antwort

1
F facedeer@fedia.io

It made the ruling stronger, not weaker. The judge was accepting the most extreme claims that the Authors were making and still finding no copyright violation from training. Pushing back those claims won't help their case, it's already as strong as it's ever going to get.

As far as the judge was concerned, it didn't matter whether the AI did or did not "memorize" its training data. He said it didn't violate copyright either way.
V This user is from outside of this forum
V This user is from outside of this forum
voterfrog@lemmy.world

schrieb am zuletzt editiert von

#191

Makes sense to me. Search indices tend to store large amounts of copyrighted material yet they don't violate copyright. What matters is whether or not you're redistributing illegal copies of the material.
1 Antwort Letzte Antwort

0
P pro@programming.dev

This post did not contain any content.
R This user is from outside of this forum
R This user is from outside of this forum
randomgal@lemmy.ca

schrieb am zuletzt editiert von

#192

You're poor? Fuck you you have to pay to breathe.

Millionaire? Whatever you want daddy uwu
E 1 Antwort Letzte Antwort

37
P pro@programming.dev

This post did not contain any content.
M This user is from outside of this forum
M This user is from outside of this forum
mtk@lemmy.world

schrieb am zuletzt editiert von

#193

Check out my new site TheAIBay, you search for content and an LLM that was trained on reproducing it gives it to you, a small hash check is used to validate accuracy. It is now legal.
N B 2 Antworten Letzte Antwort

21
E elrik@lemmy.world

you think authorship is so valuable or so special that one should be granted a legally enforceable monopoly at the loosest notions of authorship

Yes, I believe creative works should be protected as that expression has value and in a digital world it is too simple to copy and deprive the original author of the value of their work. This applies equally to Disney and Tumblr artists.

I think without some agreement on the value of authorship / creation of original works, it's pointless to respond to the rest of your argument.
J This user is from outside of this forum
J This user is from outside of this forum
jwmgregory@lemmy.dbzer0.com

schrieb am zuletzt editiert von jwmgregory@lemmy.dbzer0.com

#194

I think without some agreement on the value of authorship / creation of original works, it's pointless to respond to the rest of your argument.

I agree, for this reason we’re unlikely to convince each other of much or find any sort of common ground. I don’t think that necessarily means there isn’t value in discourse tho. We probably agree more than you might think. I do think authors should be compensated, just for their actual labor. Art itself is functionally worthless, I think trying to make it behave like commodities that have actual economic value through means of legislation is overreach. It would be more ethical to accept the physical nature of information in the real world and legislate around that reality. You… literally can “download a car” nowadays, so to speak.

If copying someone’s work is so easily done why do you insist upon a system in which such an act is so harmful to the creators you care about?
E 1 Antwort Letzte Antwort

0
V voterfrog@lemmy.world

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

A student can absolutely buy a text book and then teach the other students the information in it for free. That's not redistribution. Redistribution would mean making copies of the book to hand out. That's illegal for people and companies.
L This user is from outside of this forum
L This user is from outside of this forum
lifeinmultiplechoice@lemmy.world

schrieb am zuletzt editiert von lifeinmultiplechoice@lemmy.world

#195

The language model isn't teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source. It is not alive, it has no thoughts. It has no "its own words." (As seen by the judgement that its words cannot be copyrighted.) It only has other people's words. Every word it spits out by definition is plagiarism, whether the work was copyrighted before or not.

People wonder why works, such as journalism are getting worse. Well how could they ever get better if anything a journalist writes can be absorbed in real time, reworded and regurgitated without paying any dos to the original source. One journalist article, displayed in 30 versions, dividing the original works worth up into 30 portions. The original work now being worth 1/30th its original value. Maybe one can argue it is twice as good, so 1/15th.

Long term it means all original creations... Are devalued and therefore not nearly worth pursuing. So we will only get shittier and shittier information. Every research project... Physics, Chemistry, Psychology, all technological advancements, slowly degraded as language models get better, and original sources deminish returns.
V B 2 Antworten Letzte Antwort

0

Anmelden zum Antworten

T

ByteDance AI IDE Trae telemetry continues even after opt-out
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
4

1

50 Stimmen

4 Beiträge

36 Aufrufe

B

I can think of a third red flag but it's kinda just the first red flag again.
U

Buried in Proton's AI announcement today is a pretty shocking detail about their service 👀
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

0 Stimmen

1 Beiträge

11 Aufrufe

Niemand hat geantwortet
R

Unless users take action, Android will let Gemini access third-party apps
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
120

1

590 Stimmen

120 Beiträge

2k Aufrufe

C

Building a linux phone: do you mean from scratch, or just installing one of the Linux phone OS's that already exist? I've been following Ubuntu Touch for several years now and, while they have made a lot of progress, its main hurdles have the same thing in common: mobile hardware is incredibly locked down. For example, Ubuntu Touch uses proprietary Android drivers for many low level functions. Even then, there's some features that aren't stable across all devices, like VOLTE. It sucks, I really want to use Ubuntu Touch (or any of the Linux alternatives) but I can't make phone calls or text in the US without VOLTE support. There are a few phones that support VOLTE, but the feature is either in beta, the phone is expensive, or the phone is not sold in the US. Anyways bringing that back to Graphene: In my case, I'm using this as a stopgap until Linux phones take off (assuming they ever do). For now I guess the best thing is to just be skeptic, keep things minimal, and bloat-free.
E

How AI can help you navigate layoffs, according to one executive producer at Xbox
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
4

1

8 Stimmen

4 Beiträge

58 Aufrufe

N

downvoted to hell
I

Oracle Inks Cloud Deal Worth $30 Billion a Year
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

13 Stimmen

2 Beiträge

35 Aufrufe

J

And it mentioned nothing...
T

Firefox 140 Brings Tab Unload, Custom Search & New ESR
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
41

1

234 Stimmen

41 Beiträge

562 Aufrufe

S

Read again. I quoted something along the lines of "just as much a development decision as a marketing one" and I said, it wasn't a development decision, so what's left? Firefox released just as frequently before, just that they didn’t increase the major version that often. This does not appear to be true. Why don't you take a look at the version history instead of some marketing blog post? https://www.mozilla.org/en-US/firefox/releases/ Version 2 had 20 releases within 730 days, averaging one release every 36.5 days. Version 3 had 19 releases within 622 days, averaging 32.7 days per release. But these releases were unscheduled, so they were released when they were done. Now they are on a fixed 90-day schedule, no matter if anything worthwhile was complete or not, plus hotfix releases whenever they are necessary. That's not faster, but instead scheduled, and also they are incrementing the major version even if no major change was included. That's what the blog post was alluding to. In the before times, a major version number increase indicated major changes. Now it doesn't anymore, which means sysadmins still need to consider each release a major release, even if it doesn't contain major changes because it might contain them and the version name doesn't say anything about whether it does or not. It's nothing but a marketing change, moving from "version numbering means something" to "big number go up".
S

Revolutionizing Dining: The Rise of Digital Menu Boards for Restaurants
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

20 Aufrufe

Niemand hat geantwortet
T

The Case for Software Craftsmanship in the Era of Vibes — Zed's Blog
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
11

1

61 Stimmen

11 Beiträge

114 Aufrufe

K

If you use LLMs like they should be, i.e. as autocomplete, they're helpful. Classic autocomplete can't see me type "import" and correctly guess that I want to import a file that I just created, but Copilot can. You shouldn't expect it to understand code, but it can type more quickly than you and plug the right things in more often than not.