linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Technology

213 Beiträge 113 Kommentatoren 0 Aufrufe

I isveryloud@lemmy.ca

My interpretation was that AI companies can train on material they are licensed to use, but the courts have deemed that Anthropic pirated this material as they were not licensed to use it.

In other words, if Anthropic bought the physical or digital books, it would be fine so long as their AI couldn't spit it out verbatim, but they didn't even do that, i.e. the AI crawler pirated the book.
D This user is from outside of this forum
D This user is from outside of this forum
devils_advocate@sh.itjust.works

schrieb zuletzt editiert von

#117

Does buying the book give you license to digitise it?

Does owning a digital copy of the book give you license to convert it into another format and copy it into a database?

Definitions of "Ownership" can be very different.
E V 2 Antworten Letzte Antwort

6
F fum@lemmy.world

This was my understanding also, and why I think the judge is bad at their job.
L This user is from outside of this forum
L This user is from outside of this forum
lifeinmultiplechoice@lemmy.world

schrieb zuletzt editiert von

#118

I suppose someone could develop an LLM that digests textbooks, and rewords the text and spits it back out. Then distribute it for free page for page. You can't copy right the math problems I don't think.. so if the text wording is what gives it credence, that would have been changed.
W 1 Antwort Letzte Antwort

0
A ayane@lemmy.vg

I joined lemmy specifically to avoid this reddit mindset of jumping to conclusions after reading a headline

Guess some things never change...
J This user is from outside of this forum
J This user is from outside of this forum
jwmgregory@lemmy.dbzer0.com

schrieb zuletzt editiert von

#119

Well to be honest lemmy is less prone to knee-jerk reactionary discussion but on a handful of topics it is virtually guaranteed to happen no matter what, even here. For example, this entire site, besides a handful of communities, is vigorously anti-AI; and in the words of u/jsomae@lemmy.ml elsewhere in this comment chain:

"It seems the subject of AI causes lemmites to lose all their braincells."

I think there is definitely an interesting take on the sociology of the digital age in here somewhere but it's too early in the morning to be tapping something like that out lol
1 Antwort Letzte Antwort

5
L lovablesidekick@lemmy.world

You're getting douchevoted because on lemmy any AI-related comment that isn't negative enough about AI is the Devil's Work.
J This user is from outside of this forum
J This user is from outside of this forum
jwmgregory@lemmy.dbzer0.com

schrieb zuletzt editiert von

#120

Some communities on this site speak about machine learning exactly how I see grungy Europeans from pre-18th century manuscripts speaking about witches, Satan, and evil... as if it is some pervasive, black-magic miasma.

As someone who is in the field of machine learning academically/professionally it's honestly kind of shocking and has largely informed my opinion of society at large as an adult. No one puts any effort into learning if they see the letters "A" and "I" in all caps, next to each other. Immediately turn their brain off and start regurgitating points and responding reflexively, on Lemmy or otherwise. People talk about it so confidently while being so frustratingly unaware of their own ignorance on the matter, which, for lack of a better comparison... reminds me a lot of how historically and in fiction human beings have treated literal magic.

That's my main issue with the entire swath of "pro vs anti AI" discourse... all these people treating something that, to me, is simple & daily reality as something entirely different than my own personal notion of it.
L A 2 Antworten Letzte Antwort

3
F freedomadvocate@lemmy.net.au

You can “use” them to learn from, just like “AI” can.

What exactly do you think AI does when it “learns” from a book, for example? Do you think it will just spit out the entire book if you ask it to?
G This user is from outside of this forum
G This user is from outside of this forum
gaja@lemm.ee

schrieb zuletzt editiert von

#121

I am educated on this. When an ai learns, it takes an input through a series of functions and are joined at the output. The set of functions that produce the best output have their functions developed further. Individuals do not process information like that. With poor exploration and biasing, the output of an AI model could look identical to its input. It did not "learn" anymore than a downloaded video ran through a compression algorithm.
E 1 Antwort Letzte Antwort

1
D dojan@pawb.social

LLMs don’t learn, and they’re not people. Applying the same logic doesn’t make much sense.
F This user is from outside of this forum
F This user is from outside of this forum
facedeer@fedia.io

schrieb zuletzt editiert von

#122

The judge isn't saying that they learn or that they're people. He's saying that training falls into the same legal classification as learning.
D 1 Antwort Letzte Antwort

0
F freedomadvocate@lemmy.net.au

Your very first statement calling my basis for my argument incorrect is incorrect lol.

LLMs “learn” things from the content they consume. They don’t just take the content in wholesale and keep it there to regurgitate on command.

On your last part, unless someone uses AI to recreate the tone etc of a best selling author *and then markets their book/writing as being from said best selling author, and doesn’t use trademarked characters etc, there’s no issue. You can’t copyright a style of writing.
W This user is from outside of this forum
W This user is from outside of this forum
wraithgear@lemmy.world

schrieb zuletzt editiert von wraithgear@lemmy.world

#123

If what you are saying is true, why were these ‘AI’s” incapable of rendering a full wine glass? It ‘knows’ the concept of a full glass of water, but because of humanities social pressures, a full wine glass being the epitome of gluttony, art work did not depict a full wine glass, no matter how ai prompters demanded, it was unable to link the concepts until it was literally created for it to regurgitate it out. It seems ‘AI’ doesn’t really learn, but regurgitates art out in collages of taken assets, smoothed over at the seams.
A F 2 Antworten Letzte Antwort

2
L lifeinmultiplechoice@lemmy.world

I suppose someone could develop an LLM that digests textbooks, and rewords the text and spits it back out. Then distribute it for free page for page. You can't copy right the math problems I don't think.. so if the text wording is what gives it credence, that would have been changed.
W This user is from outside of this forum
W This user is from outside of this forum
wraithgear@lemmy.world

schrieb zuletzt editiert von

#124

If a human did that it’s still plagiarism.
L 1 Antwort Letzte Antwort

1
G gian@lemmy.grys.it

What a bad judge.

Why ? Basically he simply stated that you can use whatever material you want to train your model as long as you ask the permission to use it (and presumably pay for it) to the author (or copytight holder)
P This user is from outside of this forum
P This user is from outside of this forum
patatahooligan@lemmy.world

schrieb zuletzt editiert von

#125

"Fair use" is the exact opposite of what you're saying here. It says that you don't need to ask for any permission. The judge ruled that obtaining illegitimate copies was unlawful but use without the creators consent is perfectly fine.
1 Antwort Letzte Antwort

2
F freedomadvocate@lemmy.net.au

Not at all true. AI doesn’t just reproduce content it was trained on on demand.
W This user is from outside of this forum
W This user is from outside of this forum
wraithgear@lemmy.world

schrieb zuletzt editiert von

#126

It can, the only thing stopping it is if it is specifically told not to, and this consideration is successfully checked for. It is completely capable of plagiarizing otherwise.
F 1 Antwort Letzte Antwort

0
I isveryloud@lemmy.ca

Gist:

What’s new: The Northern District of California has granted a summary judgment for Anthropic that the training use of the copyrighted books and the print-to-digital format change were both “fair use” (full order below box). However, the court also found that the pirated library copies that Anthropic collected could not be deemed as training copies, and therefore, the use of this material was not “fair”. The court also announced that it will have a trial on the pirated copies and any resulting damages, adding:

“That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”
D This user is from outside of this forum
D This user is from outside of this forum
derisionconsulting@lemmy.ca

schrieb zuletzt editiert von derisionconsulting@lemmy.ca

#127
Formatting thing: if you start a line in a new paragraph with four spaces, it assumes that you want to display the text as a code and won't line break.

This means that the last part of your comment is a long line that people need to scroll to see. If you remove one of the spaces, or you remove the empty line between it and the previous paragraph, it'll look like a normal comment

With an empty line of space:

1 space - and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

2 spaces - and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

3 spaces - and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.
```
4 spaces -  and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.
```
I B 2 Antworten Letzte Antwort

1
W wraithgear@lemmy.world

If a human did that it’s still plagiarism.
L This user is from outside of this forum
L This user is from outside of this forum
lifeinmultiplechoice@lemmy.world

schrieb zuletzt editiert von lifeinmultiplechoice@lemmy.world

#128

Oh I agree it should be, but following the judges ruling, I don't see how it could be. You trained an LLM on textbooks that were purchased, not pirated. And the LLM distributed the responses.

(Unless you mean the human reworded them, then yeah, we aren't special apparently)
W 1 Antwort Letzte Antwort

1
D deathsembrace@lemmy.world

So I can't use any of these works because it's plagiarism but AI can?
E This user is from outside of this forum
E This user is from outside of this forum
enkimaru@lemmy.world

schrieb zuletzt editiert von

#129

Why would it be plagiarism if you use the knowledge you gain from a book?
1 Antwort Letzte Antwort

4
D devils_advocate@sh.itjust.works

Does buying the book give you license to digitise it?

Does owning a digital copy of the book give you license to convert it into another format and copy it into a database?

Definitions of "Ownership" can be very different.
E This user is from outside of this forum
E This user is from outside of this forum
enkimaru@lemmy.world

schrieb zuletzt editiert von

#130

You can digitize the books you own. You do not need a license for that. And of course you could put that digital format into a database. As databases are explicit exceptions from copyright law. If you want to go to the extreme: delete first copy. Then you have only in the database. However: AIs/LLMs are not based on data bases. But on neural networks. The original data gets lost when "learned".
N 1 Antwort Letzte Antwort

3
L lifeinmultiplechoice@lemmy.world

Oh I agree it should be, but following the judges ruling, I don't see how it could be. You trained an LLM on textbooks that were purchased, not pirated. And the LLM distributed the responses.

(Unless you mean the human reworded them, then yeah, we aren't special apparently)
W This user is from outside of this forum
W This user is from outside of this forum
wraithgear@lemmy.world

schrieb zuletzt editiert von wraithgear@lemmy.world

#131

Yes, on the second part. Just rearranging or replacing words in a text is not transformative, which is a requirement. There is an argument that the ‘AI’ are capable of doing transformative work, but the tokenizing and weight process is not magic and in my use of multiple LLM’s they do not have an understanding of the material any more then a dictionary understands the material printed on its pages.

An example was the wine glass problem. Art ‘AI’s were unable to display a wine glass filled to the top. No matter how it was prompted, or what style it aped, it would fail to do so and report back that the glass was full. But it could render a full glass of water. It didn’t understand what a full glass was, not even for the water. How was this possible? Well there was very little art of a full wine glass, because society has an unspoken rule that a full wine glass is the epitome of gluttony, and it is to be savored not drunk. Where as the reference of full glasses of water were abundant. It doesn’t know what full means, just that pictures of full glass of water are tied to phrases full, glass, and water.
L 1 Antwort Letzte Antwort

0
W wraithgear@lemmy.world

If what you are saying is true, why were these ‘AI’s” incapable of rendering a full wine glass? It ‘knows’ the concept of a full glass of water, but because of humanities social pressures, a full wine glass being the epitome of gluttony, art work did not depict a full wine glass, no matter how ai prompters demanded, it was unable to link the concepts until it was literally created for it to regurgitate it out. It seems ‘AI’ doesn’t really learn, but regurgitates art out in collages of taken assets, smoothed over at the seams.
A This user is from outside of this forum
A This user is from outside of this forum
alsimoneau@lemmy.ca

schrieb zuletzt editiert von

#132

Copilot did it just fine
W A 2 Antworten Letzte Antwort

1
D derisionconsulting@lemmy.ca
Formatting thing: if you start a line in a new paragraph with four spaces, it assumes that you want to display the text as a code and won't line break.

This means that the last part of your comment is a long line that people need to scroll to see. If you remove one of the spaces, or you remove the empty line between it and the previous paragraph, it'll look like a normal comment

With an empty line of space:

1 space - and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

2 spaces - and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

3 spaces - and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.
```
4 spaces -  and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.
```
I This user is from outside of this forum
I This user is from outside of this forum
isveryloud@lemmy.ca

schrieb zuletzt editiert von

#133

Thanks, I had copy-pasted it from the website
1 Antwort Letzte Antwort

2
A alsimoneau@lemmy.ca

Copilot did it just fine
W This user is from outside of this forum
W This user is from outside of this forum
wraithgear@lemmy.world

schrieb zuletzt editiert von

#134
1 it’s not full, but closer then it was.
1. I specifically said that the AI was unable to do it until someone specifically made a reference so that it could start passing the test so it’s a little bit late to prove much.
A 1 Antwort Letzte Antwort

4
S saharamaleikuhm@feddit.org

But I thought they admitted to torrenting terabytes of ebooks?
A This user is from outside of this forum
A This user is from outside of this forum
antonim@lemmy.dbzer0.com

schrieb zuletzt editiert von

#135

Facebook (Meta) torrented TBs from Libgen, and their internal chats leaked so we know about that, and IIRC they've been sued. Maybe you're thinking of that case?
1 Antwort Letzte Antwort

10
P prox@lemmy.world

FTA:

Anthropic warned against “[t]he prospect of ruinous statutory damages—$150,000 times 5 million books”: that would mean $750 billion.

So part of their argument is actually that they stole so much that it would be impossible for them/anyone to pay restitution, therefore we should just let them off the hook.
I This user is from outside of this forum
I This user is from outside of this forum
interdimensionalmeme@lemmy.ml

schrieb zuletzt editiert von

#136

What is means is they don't own the models. They are the commons of humanity, they are merely temporary custodians. The nightnare ending is the elites keeping the most capable and competent models for themselves as private play things. That must not be allowed to happen under any circumstances. Sue openai, anthropic and the other enclosers, sue them for trying to take their ball and go home. Disposses them and sue the investors for their corrupt influence on research.
1 Antwort Letzte Antwort

0

Anmelden zum Antworten

A

OSTP Has a Choice to Make: Science or Politics?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

1

30 Stimmen

7 Beiträge

2 Aufrufe

B

Ye I expect so, I don't like the way this author just doesn't bother explaining her points. She just states that she disagrees and says they should be left to their own rules. Which is probably fine, but that's just lazy or she's not mentioning the difference for another reason
T

Microsoft’s new genAI model to power agents in Windows 11
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
12

1

31 Stimmen

12 Beiträge

0 Aufrufe

U

which one would sell more I mean they would charge a lot of money for the stripped down one because it doesn't allow them to monetize it on the back end, and the vast majority would continue using the resource-slurping ad-riddled one.
G

I Counted All of the Yurts in Mongolia Using Machine Learning
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
9

17 Stimmen

9 Beiträge

9 Aufrufe

G

I'd say, when there's a policy and its goals aren't reached, that's a policy failure. If people don't like the policy, that's an issue but it's a separate issue. It doesn't seem likely that people prefer living in tents, though. But to be fair, the government may be doing the best it can. It's ranked "Flawed Democracy" by The Economist Democracy Index. That's really good, I'd say, considering the circumstances. They are placed slightly ahead of Argentina and Hungary. OP has this to say: Due to the large number of people moving to urban locations, it has been difficult for the government to build the infrastructure needed for them. The informal settlements that grew from this difficulty are now known as ger districts. There have been many efforts to formalize and develop these areas. The Law on Allocation of Land to Mongolian Citizens for Ownership, passed in 2002, allowed for existing ger district residents to formalize the land they settled, and allowed for others to receive land from the government into the future. Along with the privatization of land, the Mongolian government has been pushing for the development of ger districts into areas with housing blocks connected to utilities. The plan for this was published in 2014 as Ulaanbaatar 2020 Master Plan and Development Approaches for 2030. Although progress has been slow (Choi and Enkhbat 7), they have been making progress in building housing blocks in ger distrcts. Residents of ger districts sell or exchange their plots to developers who then build housing blocks on them. Often this is in exchange for an apartment in the building, and often the value of the apartment is less than the land they originally had (Choi and Enkhbat 15). Based on what I’ve read about the ger districts, they have been around since at least the 1970s, and progress on developing them has been slow. When ineffective policy results in a large chunk of the populace generationally living in yurts on the outskirts of urban areas, it’s clear that there is failure. Choi, Mack Joong, and Urandulguun Enkhbat. “Distributional Effects of Ger Area Redevelopment in Ulaanbaatar, Mongolia.” International Journal of Urban Sciences, vol. 24, no. 1, Jan. 2020, pp. 50–68. DOI.org (Crossref), https://doi.org/10.1080/12265934.2019.1571433.
X

Strategies to Enhance the Efficiency of Livestock Conveyor Systems
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

0 Stimmen

1 Beiträge

3 Aufrufe

Niemand hat geantwortet
P

The Medford Police Department is Building a Network of Mass Surveillance Through Controversial Collaborations With ICE’s Homeland Security Investigations and Law Enforcement Agencies
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

36 Stimmen

1 Beiträge

2 Aufrufe

Niemand hat geantwortet
P

Microsoft and the CWA reach a tentative contract agreement for ~300 ZeniMax QA workers after two years of talks, marking Microsoft's first US union contract
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1

27 Stimmen

2 Beiträge

5 Aufrufe

F

Small progress is still progress. Kick management in the dick, friends.
A

Louisiana Becomes First State to Adopt DOGE Voter Maintenance Database
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

68 Stimmen

7 Beiträge

6 Aufrufe

H

Worked with the US federal government for much of my professional career, mostly in an adversarial role. "reliable federal data sources" do not exist
N

Stack overflow is almost dead
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
5

0 Stimmen

5 Beiträge

7 Aufrufe

I

students When I was a student I despised the idea of typeless var in C#. Then a few years later at my day job I fully embraced C++ auto. I understand the frustration but unfortunately being wrong is part of learning