linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Technology

254 Beiträge 123 Kommentatoren 6.4k Aufrufe

G gian@lemmy.grys.it

What a bad judge.

Why ? Basically he simply stated that you can use whatever material you want to train your model as long as you ask the permission to use it (and presumably pay for it) to the author (or copytight holder)
L This user is from outside of this forum
L This user is from outside of this forum
lifeinmultiplechoice@lemmy.world

schrieb am zuletzt editiert von lifeinmultiplechoice@lemmy.world

#101

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

They may be trying to put safeguards so it isn't directly happening, but here is an example that the text is there word for word:
G F F V 4 Antworten Letzte Antwort

3
I isveryloud@lemmy.ca

Gist:

What’s new: The Northern District of California has granted a summary judgment for Anthropic that the training use of the copyrighted books and the print-to-digital format change were both “fair use” (full order below box). However, the court also found that the pirated library copies that Anthropic collected could not be deemed as training copies, and therefore, the use of this material was not “fair”. The court also announced that it will have a trial on the pirated copies and any resulting damages, adding:

“That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”
D This user is from outside of this forum
D This user is from outside of this forum
deathsembrace@lemmy.world

schrieb am zuletzt editiert von

#102

So I can't use any of these works because it's plagiarism but AI can?
I F E N 4 Antworten Letzte Antwort

9
G gissamittjobb@lemmy.ml

It's extremely frustrating to read this comment thread because it's obvious that so many of you didn't actually read the article, or even half-skim the article, or even attempted to even comprehend the title of the article for more than a second.

For shame.
L This user is from outside of this forum
L This user is from outside of this forum
lifeinmultiplechoice@lemmy.world

schrieb am zuletzt editiert von

#103

"While the copies used to convert purchased print library copies into digital library copies were slightly disfavored by the second factor (nature of the work), the court still found “on balance” that it was a fair use because the purchased print copy was destroyed and its digital replacement was not redistributed."

So you find this to be valid?
To me it is absolutely being redistributed
1 Antwort Letzte Antwort

1
D This user is from outside of this forum
D This user is from outside of this forum
dojan@pawb.social

schrieb am zuletzt editiert von

#104

LLMs don’t learn, and they’re not people. Applying the same logic doesn’t make much sense.
F 1 Antwort Letzte Antwort

1
F freedomadvocate@lemmy.net.au

Makes sense. AI can “learn” from and “read” a book in the same way a person can and does, as long as it is acquired legally. AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

Some people just see “AI” and want everything about it outlawed basically. If you put some information out into the public, you don’t get to decide who does and doesn’t consume and learn from it. If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.
E This user is from outside of this forum
E This user is from outside of this forum
elrik@lemmy.world

schrieb am zuletzt editiert von

#105

AI can “learn” from and “read” a book in the same way a person can and does

This statement is the basis for your argument and it is simply not correct.

Training LLMs and similar AI models is much closer to a sophisticated lossy compression algorithm than it is to human learning. The processes are not at all similar given our current understanding of human learning.

AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

The current Disney lawsuit against Midjourney is illustrative - literally, it includes numerous side-by-side comparisons - of how AI models are capable of recreating iconic copyrighted work that is indistinguishable from the original.

If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

An AI doesn't create works on its own. A human instructs AI to do so. Attribution is also irrelevant. If a human uses AI to recreate the exact tone, structure and other nuances of say, some best selling author, they harm the marketability of the original works which fails fair use tests (at least in the US).
F J 2 Antworten Letzte Antwort

7
D deathsembrace@lemmy.world

So I can't use any of these works because it's plagiarism but AI can?
I This user is from outside of this forum
I This user is from outside of this forum
isveryloud@lemmy.ca

schrieb am zuletzt editiert von

#106

My interpretation was that AI companies can train on material they are licensed to use, but the courts have deemed that Anthropic pirated this material as they were not licensed to use it.

In other words, if Anthropic bought the physical or digital books, it would be fine so long as their AI couldn't spit it out verbatim, but they didn't even do that, i.e. the AI crawler pirated the book.
D 1 Antwort Letzte Antwort

16
V vane@lemmy.world

Ok so you can buy books scan them or ebooks and use for AI training but you can't just download priated books from internet to train AI. Did I understood that correctly ?
F This user is from outside of this forum
F This user is from outside of this forum
forkdestroyer@infosec.pub

schrieb am zuletzt editiert von

#107

Make an AI that is trained on the books.

Tell it to tell you a story for one of the books.

Read the story without paying for it.

The law says this is ok now, right?
L E N B 4 Antworten Letzte Antwort

4
F forkdestroyer@infosec.pub

Make an AI that is trained on the books.

Tell it to tell you a story for one of the books.

Read the story without paying for it.

The law says this is ok now, right?
L This user is from outside of this forum
L This user is from outside of this forum
loreleisanktheship@lemmy.ml

schrieb am zuletzt editiert von

#108

As long as they don't use exactly the same words in the book, yeah, as I understand it.
V 1 Antwort Letzte Antwort

5
J j0ester@lemmy.world

Huh? Didn’t Meta not use any permission, and pirated a lot of books to train their model?
G This user is from outside of this forum
G This user is from outside of this forum
gian@lemmy.grys.it

schrieb am zuletzt editiert von

#109

True. And I will be happy if someone sue them and the judge say the same thing.
1 Antwort Letzte Antwort

1
L lifeinmultiplechoice@lemmy.world

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

They may be trying to put safeguards so it isn't directly happening, but here is an example that the text is there word for word:
G This user is from outside of this forum
G This user is from outside of this forum
gian@lemmy.grys.it

schrieb am zuletzt editiert von

#110

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

Well, it would be interesting if this case would be used as precedence in a case invonving a single student that do the same thing. But you are right
F 1 Antwort Letzte Antwort

1
D deathsembrace@lemmy.world

So I can't use any of these works because it's plagiarism but AI can?
F This user is from outside of this forum
F This user is from outside of this forum
freedomadvocate@lemmy.net.au

schrieb am zuletzt editiert von

#111

You can “use” them to learn from, just like “AI” can.

What exactly do you think AI does when it “learns” from a book, for example? Do you think it will just spit out the entire book if you ask it to?
D G 2 Antworten Letzte Antwort

4
E elrik@lemmy.world

AI can “learn” from and “read” a book in the same way a person can and does

This statement is the basis for your argument and it is simply not correct.

Training LLMs and similar AI models is much closer to a sophisticated lossy compression algorithm than it is to human learning. The processes are not at all similar given our current understanding of human learning.

AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

The current Disney lawsuit against Midjourney is illustrative - literally, it includes numerous side-by-side comparisons - of how AI models are capable of recreating iconic copyrighted work that is indistinguishable from the original.

If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

An AI doesn't create works on its own. A human instructs AI to do so. Attribution is also irrelevant. If a human uses AI to recreate the exact tone, structure and other nuances of say, some best selling author, they harm the marketability of the original works which fails fair use tests (at least in the US).
F This user is from outside of this forum
F This user is from outside of this forum
freedomadvocate@lemmy.net.au

schrieb am zuletzt editiert von freedomadvocate@lemmy.net.au

#112

Your very first statement calling my basis for my argument incorrect is incorrect lol.

LLMs “learn” things from the content they consume. They don’t just take the content in wholesale and keep it there to regurgitate on command.

On your last part, unless someone uses AI to recreate the tone etc of a best selling author and then markets their book/writing as being from said best selling author, and doesn’t use trademarked characters etc, there’s no issue. You can’t copyright a style of writing.
W E 2 Antworten Letzte Antwort

3
P pro@programming.dev

This post did not contain any content.
S This user is from outside of this forum
S This user is from outside of this forum
saharamaleikuhm@feddit.org

schrieb am zuletzt editiert von

#113

But I thought they admitted to torrenting terabytes of ebooks?
A F 2 Antworten Letzte Antwort

32
F freedomadvocate@lemmy.net.au

You can “use” them to learn from, just like “AI” can.

What exactly do you think AI does when it “learns” from a book, for example? Do you think it will just spit out the entire book if you ask it to?
D This user is from outside of this forum
D This user is from outside of this forum
deathsembrace@lemmy.world

schrieb am zuletzt editiert von

#114

It cant speak or use any words without it being someone elses words it learned from? Unless its giving sources everything is always from something it learned because it cannot speak or use words without that source in the first place?
N 1 Antwort Letzte Antwort

0
G gian@lemmy.grys.it

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

Well, it would be interesting if this case would be used as precedence in a case invonving a single student that do the same thing. But you are right
F This user is from outside of this forum
F This user is from outside of this forum
fum@lemmy.world

schrieb am zuletzt editiert von

#115

This was my understanding also, and why I think the judge is bad at their job.
L 1 Antwort Letzte Antwort

0
E elrik@lemmy.world

AI can “learn” from and “read” a book in the same way a person can and does

This statement is the basis for your argument and it is simply not correct.

Training LLMs and similar AI models is much closer to a sophisticated lossy compression algorithm than it is to human learning. The processes are not at all similar given our current understanding of human learning.

AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

The current Disney lawsuit against Midjourney is illustrative - literally, it includes numerous side-by-side comparisons - of how AI models are capable of recreating iconic copyrighted work that is indistinguishable from the original.

If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

An AI doesn't create works on its own. A human instructs AI to do so. Attribution is also irrelevant. If a human uses AI to recreate the exact tone, structure and other nuances of say, some best selling author, they harm the marketability of the original works which fails fair use tests (at least in the US).
J This user is from outside of this forum
J This user is from outside of this forum
jwmgregory@lemmy.dbzer0.com

schrieb am zuletzt editiert von

#116

Even if we accept all your market liberal premise without question... in your own rhetorical framework the Disney lawsuit should be ruled against Disney.

If a human uses AI to recreate the exact tone, structure and other nuances of say, some best selling author, they harm the marketability of the original works which fails fair use tests (at least in the US).

Says who? In a free market why is the competition from similar products and brands such a threat as to be outlawed? Think reasonably about what you are advocating... you think authorship is so valuable or so special that one should be granted a legally enforceable monopoly at the loosest notions of authorship. This is the definition of a slippery-slope, and yet, it is the status quo of the society we live in.

On it "harming marketability of the original works," frankly, that's a fiction and anyone advocating such ideas should just fucking weep about it instead of enforce overreaching laws on the rest of us. If you can't sell your art because a machine made "too good a copy" of your art, it wasn't good art in the first place and that is not the fault of the machine. Even big pharma doesn't get to outright ban generic medications (even tho they certainly tried)... it is patently fucking absurd to decry artist's lack of a state-enforced monopoly on their work. Why do you think we should extend such a radical policy towards... checks notes... tumblr artists and other commission based creators? It's not good when big companies do it for themselves through lobbying, it wouldn't be good to do it for "the little guy," either. The real artists working in industry don't want to change the law this way because they know it doesn't work in their favor. Disney's lawsuit is in the interest of Disney and big capital, not artists themselves, despite what these large conglomerates that trade in IPs and dreams might try to convince the art world writ large of.
E 1 Antwort Letzte Antwort

0
L lifeinmultiplechoice@lemmy.world

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

They may be trying to put safeguards so it isn't directly happening, but here is an example that the text is there word for word:
F This user is from outside of this forum
F This user is from outside of this forum
freedomadvocate@lemmy.net.au

schrieb am zuletzt editiert von

#117

Not at all true. AI doesn’t just reproduce content it was trained on on demand.
W 1 Antwort Letzte Antwort

0
I isveryloud@lemmy.ca

My interpretation was that AI companies can train on material they are licensed to use, but the courts have deemed that Anthropic pirated this material as they were not licensed to use it.

In other words, if Anthropic bought the physical or digital books, it would be fine so long as their AI couldn't spit it out verbatim, but they didn't even do that, i.e. the AI crawler pirated the book.
D This user is from outside of this forum
D This user is from outside of this forum
devils_advocate@sh.itjust.works

schrieb am zuletzt editiert von

#118

Does buying the book give you license to digitise it?

Does owning a digital copy of the book give you license to convert it into another format and copy it into a database?

Definitions of "Ownership" can be very different.
E V B 3 Antworten Letzte Antwort

6
F fum@lemmy.world

This was my understanding also, and why I think the judge is bad at their job.
L This user is from outside of this forum
L This user is from outside of this forum
lifeinmultiplechoice@lemmy.world

schrieb am zuletzt editiert von

#119

I suppose someone could develop an LLM that digests textbooks, and rewords the text and spits it back out. Then distribute it for free page for page. You can't copy right the math problems I don't think.. so if the text wording is what gives it credence, that would have been changed.
W 1 Antwort Letzte Antwort

0
A ayane@lemmy.vg

I joined lemmy specifically to avoid this reddit mindset of jumping to conclusions after reading a headline

Guess some things never change...
J This user is from outside of this forum
J This user is from outside of this forum
jwmgregory@lemmy.dbzer0.com

schrieb am zuletzt editiert von

#120

Well to be honest lemmy is less prone to knee-jerk reactionary discussion but on a handful of topics it is virtually guaranteed to happen no matter what, even here. For example, this entire site, besides a handful of communities, is vigorously anti-AI; and in the words of u/jsomae@lemmy.ml elsewhere in this comment chain:

"It seems the subject of AI causes lemmites to lose all their braincells."

I think there is definitely an interesting take on the sociology of the digital age in here somewhere but it's too early in the morning to be tapping something like that out lol
1 Antwort Letzte Antwort

5

Anmelden zum Antworten

D

“On Tuesday afternoon, ChatGPT encouraged me to cut my wrists.”
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
56

1

287 Stimmen

56 Beiträge

466 Aufrufe

T

well they all did add to the discussion! they gave me something to think about
T

OpenAI just launched its new ChatGPT Agent that can make as many as 1 complicated cupcake order per hour, but even Sam Altman says you probably shouldn't trust it for 'high-stakes uses'
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
56

1

185 Stimmen

56 Beiträge

773 Aufrufe

T

Actually, nope! Claiming that you personally didn't learn with an IDE and that there are make-believe scenarios where one is not available is not actually addressing the argument. There really aren't any situations that make any sense at all where an IDE is not available. I've worked in literally the most strict and locked down environments in the world, and there is always approved software and tools to use... because duh! Of course there is, silly, work needs to get done. Unless you're talking about a coding 101 class or something academic and basic. Anyway, that's totally irrelevant regardless, because its PURE fantasy to have access to something like Claude and not have access to an IDE. So your argument is entirely flawed and invalid.
K

Exclusive: OpenAI to release web browser in challenge to Google Chrome
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
28

54 Stimmen

28 Beiträge

339 Aufrufe

T

Also Servo is now under the Linux Foundation. Both this and Ladybird are very exciting.
E

Windows 11 has finally overtaken Windows 10 as the most used desktop OS
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
27

1

62 Stimmen

27 Beiträge

324 Aufrufe

D

It takes 7 seconds for the terminal to load on my brand new laptop. I'm sure there's some way to fix it, but that...just enrages me.
Z

The racist tendencies within ICE agencies directly affect law enforcement fairness
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

2 Stimmen

1 Beiträge

17 Aufrufe

Niemand hat geantwortet
P

This is how you stop data trackers from sucking up your health data
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1

43 Stimmen

2 Beiträge

31 Aufrufe

C

From the same source, Blacklight is really good. https://themarkup.org/series/blacklight Blacklight is a Real-Time Website Privacy Inspector. Enter the address of any website, and Blacklight will scan it and reveal the specific user-tracking technologies on the site So you can see what's happening on a site before you visit it
D

Apple acquires RAC7, its first-ever video game studio
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
16

1

67 Stimmen

16 Beiträge

149 Aufrufe

E

I'm not questioning whether or not the game is good, just wondering why Apple would want to limit their customer base so much.
P

Microsoft Gives European Union Users More Control: Uninstall Edge, Store, and Say Goodbye to Bing Prompts
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
187

1

760 Stimmen

187 Beiträge

10k Aufrufe

O

Not being a coward.