linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Technology

254 Beiträge 123 Kommentatoren 1.9k Aufrufe

F forkdestroyer@infosec.pub

Make an AI that is trained on the books.

Tell it to tell you a story for one of the books.

Read the story without paying for it.

The law says this is ok now, right?
B This user is from outside of this forum
B This user is from outside of this forum
booly@sh.itjust.works

schrieb zuletzt editiert von

#231

The law says this is ok now, right?

No.

The judge accepted the fact that Anthropic prevents users from obtaining the underlying copyrighted text through interaction with its LLM, and that there are safeguards in the software that prevent a user from being able to get an entire copyrighted work out of that LLM. It discusses the Google Books arrangement, where the books are scanned in the entirety, but where a user searching in Google Books can't actually retrieve more than a few snippets from any given book.

Anthropic get to keep the copy of the entire book. It doesn't get to transmit the contents of that book to someone else, even through the LLM service.

The judge also explicitly stated that if the authors can put together evidence that it is possible for a user to retrieve their entire copyrighted work out of the LLM, they'd have a different case and could sue over it at that time.
1 Antwort Letzte Antwort

0
R rvtv95xbeo@sh.itjust.works

But if one person buys a book, trains an "AI model" to recite it, then distributes that model we good?
B This user is from outside of this forum
B This user is from outside of this forum
booly@sh.itjust.works

schrieb zuletzt editiert von

#232

No. The court made its ruling with the explicit understanding that the software was configured not to recite more than a few snippets from any copyrighted work, and would never produce an entire copyrighted work (or even a significant portion of a copyrighted work) in its output.

And the judge specifically reserved that question, saying if the authors could develop evidence that it was possible for a user to retrieve significant copyrighted material out of the LLM, they'd have a different case and would be able to sue under those facts.
1 Antwort Letzte Antwort

0
R randomgal@lemmy.ca

You're poor? Fuck you you have to pay to breathe.

Millionaire? Whatever you want daddy uwu
E This user is from outside of this forum
E This user is from outside of this forum
eestileib@lemmy.blahaj.zone

schrieb zuletzt editiert von

#233

That's kind of how I read it too.

But as a side effect it means you're still allowed to photograph your own books at home as a private citizen if you own them.

Prepare to never legally own another piece of media in your life.
1 Antwort Letzte Antwort

2
F facedeer@fedia.io

Yes, and that part of the case is going to trial. This was a preliminary judgment specifically about the training itself.
B This user is from outside of this forum
B This user is from outside of this forum
booly@sh.itjust.works

schrieb zuletzt editiert von

#234

specifically about the training itself.

It's two issues being ruled on.

Yes, as you mention, the act of training an LLM was ruled to be fair use, assuming that the digital training data was legally obtained.

The other part of the ruling, which I think is really, really important for everyone, not just AI/LLM companies or developers, is that it is legal to buy printed books and digitize them into a central library with indexed metadata. Anthropic has to go to trial on the pirated books they just downloaded from the internet, but has fully won the portion of the case about the physical books they bought and digitized.
1 Antwort Letzte Antwort

0
A alphane_moon@lemmy.world

I am not a lawyer. I am talking about reality.

What does an LLM application (or training processes associated with an LLM application) have to do with the concept of learning? Where is the learning happening? Who is doing the learning?

Who is stopping the individuals at the LLM company from learning or analysing a given book?

From my experience living in the US, this is pretty standard American-style corruption. Lots of pomp and bombast and roleplay of sorts, but the outcome is no different from any other country that is in deep need of judicial and anti-corruotion reform.
B This user is from outside of this forum
B This user is from outside of this forum
booly@sh.itjust.works

schrieb zuletzt editiert von

#235

What does an LLM application (or training processes associated with an LLM application) have to do with the concept of learning?

No, you're framing the issue incorrectly.

The law concerns itself with copying. When humans learn, they inevitably copy things. They may memorize portions of copyrighted material, and then retrieve those memories in doing something new with them, or just by recreating it.

If the argument is that the mere act of copying for training an LLM is illegal copying, then what would we say about the use of copyrighted text for teaching children? They will memorize portions of what they read. They will later write some of them down. And if there is a person who memorizes an entire poem (or song) and then writes it down for someone else, that's actually a copyright violation. But if they memorize that poem or song and reuse it in creating something new and different, but with links and connections to that previous copyrighted work, then that kind of copying and processing is generally allowed.

The judge here is analyzing what exact types of copying are permitted under the law, and for that, the copyright holders' argument would sweep too broadly and prohibit all sorts of methods that humans use to learn.
1 Antwort Letzte Antwort

0
P prox@lemmy.world

FTA:

Anthropic warned against “[t]he prospect of ruinous statutory damages—$150,000 times 5 million books”: that would mean $750 billion.

So part of their argument is actually that they stole so much that it would be impossible for them/anyone to pay restitution, therefore we should just let them off the hook.
W This user is from outside of this forum
W This user is from outside of this forum
womble@lemmy.world

schrieb zuletzt editiert von

#236

The problem isnt anthropic get to use that defense, its that others dont. The fact the the world is in a place where people can be fined 5+ years of a western European average salary for making a copy of one (1) book that does not materially effect the copyright holder in any way is insane and it is good to point that out no matter who does it.
1 Antwort Letzte Antwort

1
S s_h_k@lemmy.dbzer0.com

Gives you versions like this
K This user is from outside of this forum
K This user is from outside of this forum
kazerniel@lemmy.world

schrieb zuletzt editiert von

#237

thanks I hate it xD
1 Antwort Letzte Antwort

0
L lifeinmultiplechoice@lemmy.world

The language model isn't teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source. It is not alive, it has no thoughts. It has no "its own words." (As seen by the judgement that its words cannot be copyrighted.) It only has other people's words. Every word it spits out by definition is plagiarism, whether the work was copyrighted before or not.

People wonder why works, such as journalism are getting worse. Well how could they ever get better if anything a journalist writes can be absorbed in real time, reworded and regurgitated without paying any dos to the original source. One journalist article, displayed in 30 versions, dividing the original works worth up into 30 portions. The original work now being worth 1/30th its original value. Maybe one can argue it is twice as good, so 1/15th.

Long term it means all original creations... Are devalued and therefore not nearly worth pursuing. So we will only get shittier and shittier information. Every research project... Physics, Chemistry, Psychology, all technological advancements, slowly degraded as language models get better, and original sources deminish returns.
B This user is from outside of this forum
B This user is from outside of this forum
booly@sh.itjust.works

schrieb zuletzt editiert von

#238

just spitting the information back out, without paying the copyright source

The court made its ruling under the factual assumption that it isn't possible for a user to retrieve copyrighted text from that LLM, and explained that if a copyright holder does develop evidence that it is possible to get entire significant chunks of their copyrighted text out of that LLM, then they'd be able to sue then under those facts and that evidence.

It relies heavily on the analogy to Google Books, which scans in entire copyrighted books to build the database, but where users of the service simply cannot retrieve more than a few snippets from any given book. That way, Google cannot be said to be redistributing entire books to its users without the publisher's permission.
1 Antwort Letzte Antwort

0
T thistlewick@lemmynsfw.com

You’re right, each of the 5 million books’ authors should agree to less payment for their work, to make the poor criminals feel better.

If I steal $100 from a thousand people and spend it all on hookers and blow, do I get out of paying that back because I don’t have the funds? Should the victims agree to get $20 back instead because that’s more within my budget?
W This user is from outside of this forum
W This user is from outside of this forum
womble@lemmy.world

schrieb zuletzt editiert von

#239

You think that 150,000 dollars, or roughly 180 weeks of full time pretax wages at 15$ an hour, is a reasonable fine for making a copy of one book which doe no material harm to the copyright holder?
T 1 Antwort Letzte Antwort

0
D deltapi@lemmy.world

I wonder if the archive.org cases had any bearing on the decision.
B This user is from outside of this forum
B This user is from outside of this forum
booly@sh.itjust.works

schrieb zuletzt editiert von

#240

Archive.org was distributing the books themselves to users. Anthropic argued (and the authors suing them weren't able to show otherwise) that their software prevents users from actually retrieving books out of the LLM, and that it only will produce snippets of text from copyrighted works. And producing snippets in the context of something else is fair use, like commentary or criticism.
1 Antwort Letzte Antwort

2
B bob_robertson_ix@discuss.tchncs.de

It sounds like transferring an owned print book to digital and using it to train AI was deemed permissable. But downloading a book from the Internet and using it was training data is not allowed, even if you later purchase the pirated book. So, no one will be knocking down your door for scanning your books.

This does raise an interesting case where libraries could end up training and distributing public domain AI models.
R This user is from outside of this forum
R This user is from outside of this forum
restingboredface@sh.itjust.works

schrieb zuletzt editiert von

#241

I would actually be okay with libraries having those AI services. Even if they were available only for a fee it would be absurdly low and still waived for people with low or no income.
1 Antwort Letzte Antwort

1
W womble@lemmy.world

You think that 150,000 dollars, or roughly 180 weeks of full time pretax wages at 15$ an hour, is a reasonable fine for making a copy of one book which doe no material harm to the copyright holder?
T This user is from outside of this forum
T This user is from outside of this forum
thistlewick@lemmynsfw.com

schrieb zuletzt editiert von

#242

No I don’t, but we’re not talking about a single copy of one book, and it is grovellingly insidious to imply that we are.

We are talking about a company taking the work of an author, of thousands of authors, and using it as the backbone of a machine that’s goal is to make those authors obsolete.

When the people who own the slop-machine are making millions of dollars off the back of stolen works, they can very much afford to pay those authors. If you can’t afford to run your business without STEALING, then your business is a pile of flaming shit that deserves to fail.
W 1 Antwort Letzte Antwort

0
L lovablesidekick@lemmy.world

None of the above. Every professional in the world, including me, owes our careers to looking at examples of other people's work and incorporating their work into our own work without paying a penny for it. Freely copying and imitating what we see around us has been a human norm for thousands of years - in a process known as "the spread of civilization". Relatively recently it was demonized - for purely business reasons, not moral ones - by people who got rich selling copies of other people's work and paying them a pittance known as a "royalty". That little piece of bait on the hook has convinced a lot of people to put a black hat on behavior that had been considered normal forever. If angry modern enlightened justice warriors want to treat a business concept like a moral principle and get all sweaty about it, that's fine with me, but I'm more of a traditionalist in that area.
T This user is from outside of this forum
T This user is from outside of this forum
thistlewick@lemmynsfw.com

schrieb zuletzt editiert von

#243

Nobody who is mad at this situation thinks that taking inspiration, riffing on, or referencing other people’s work is the problem when a human being does it. When a person writes, there is intention behind it.

The issue is when a business, owned by those people you think ‘demonised’ inspiration, take the works of authors and mulch them into something they lovingly named “The Pile”, in order to create derivative slop off the backs of creatives.

When you, as a “professional”, ask AI to write you a novel, who is being inspired? Who is making the connections between themes? Who is carefully crafting the text to pay loving reference to another authors work? Not you. Not the algorithm that is guessing what word to shit out next based on math.

These businesses have tricked you into thinking that what they are doing is noble.
L 1 Antwort Letzte Antwort

0
H hoppolito@mander.xyz

One point I would refute here is determinism. AI models are, by default, deterministic. They are made from deterministic parts and "any combination of deterministic components will result in a deterministic system". Randomness has to be externally injected into e.g. current LLMs to produce 'non-deterministic' output.

There is the notable exception of newer models like ChatGPT4 which seemingly produces non-deterministic outputs (i.e. give it the same sentence and it produces different outputs even with its temperature set to 0) - but my understanding is this is due to floating point number inaccuracies which lead to different token selection and thus a function of our current processor architectures and not inherent in the model itself.
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemmy.zip

schrieb zuletzt editiert von

#244

You're correct that a collection of deterministic elements will produce a deterministic result.

LLMs produce a probability distribution of next tokens and then randomly select one of them. That's where the non-determinism enters the system. Even if you set the temperature to 0 you're going to get some randomness. The GPU can round two different real numbers to the same floating point representation. When that happens, it's a hardware-level coin toss on which token gets selected.

You can test this empirically. Set the temperature to 0 and ask it, "give me a random number". You'll rarely get the same number twice in a row, no matter how similar you try to make the starting conditions.
1 Antwort Letzte Antwort

0
G gaja@lemm.ee

I've hand calculated forward propagation (neural networks). AI does not learn, its statically optimized. AI "learning" is curve fitting. Human learning requires understanding, which AI is not capable of.
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemmy.zip

schrieb zuletzt editiert von

#245

Human learning requires understanding, which AI is not capable of.

How could anyone know this?

Is there some test of understanding that humans can pass and AIs can't? And if there are humans who can't pass it, do we consider then unintelligent?

We don't even need to set the bar that high. Is there some definition of "understanding" that humans meet and AIs don't?
G 1 Antwort Letzte Antwort

0
A arcka@midwest.social

If this is the ruling which causes you to lose trust that any legal system (not just the US') aligns with morality, then I have to question where you've been all this time.
A This user is from outside of this forum
A This user is from outside of this forum
alphane_moon@lemmy.world

schrieb zuletzt editiert von

#246

I could have been more clear, but it wasn't my intention to imply that this particular case is the turning point.
1 Antwort Letzte Antwort

1
T thistlewick@lemmynsfw.com

No I don’t, but we’re not talking about a single copy of one book, and it is grovellingly insidious to imply that we are.

We are talking about a company taking the work of an author, of thousands of authors, and using it as the backbone of a machine that’s goal is to make those authors obsolete.

When the people who own the slop-machine are making millions of dollars off the back of stolen works, they can very much afford to pay those authors. If you can’t afford to run your business without STEALING, then your business is a pile of flaming shit that deserves to fail.
W This user is from outside of this forum
W This user is from outside of this forum
womble@lemmy.world

schrieb zuletzt editiert von womble@lemmy.world

#247

Except it isnt, because the judge dismissed that part of the suit, saying that people have complete right to digitise and train on works they have a legitimate copy of. So those damages are for making the unauthorised copy, per book.

And it is not STEALING as you put it, it is making an unauthorised copy, no one loses anything from a copy being made, if I STEAL your phone you no longer have that phone. I do find it sad how many people have drunk the capitalist IP maximalist stance and have somehow convinced themselves that advocating for Disney and the publishing cartel being allowed to dictate how people use works they have is somehow sticking up for the little guy
1 Antwort Letzte Antwort

0
T thistlewick@lemmynsfw.com

Nobody who is mad at this situation thinks that taking inspiration, riffing on, or referencing other people’s work is the problem when a human being does it. When a person writes, there is intention behind it.

The issue is when a business, owned by those people you think ‘demonised’ inspiration, take the works of authors and mulch them into something they lovingly named “The Pile”, in order to create derivative slop off the backs of creatives.

When you, as a “professional”, ask AI to write you a novel, who is being inspired? Who is making the connections between themes? Who is carefully crafting the text to pay loving reference to another authors work? Not you. Not the algorithm that is guessing what word to shit out next based on math.

These businesses have tricked you into thinking that what they are doing is noble.
L This user is from outside of this forum
L This user is from outside of this forum
lovablesidekick@lemmy.world

schrieb zuletzt editiert von

#248

That's 100% rationalization. Machines have never done anything with "inspiration", and that's never been a problem until now. You probably don't insist that your food be hand-carried to you from a farm, or cooked over a fire you started by rubbing two sticks together. I think the mass reaction against AI is part of a larger pattern where people want to believe they're crusading against evil without putting out the kind of effort it takes to fight any of the genuine evils in the world.
1 Antwort Letzte Antwort

1
N nednobbins@lemmy.zip

Human learning requires understanding, which AI is not capable of.

How could anyone know this?

Is there some test of understanding that humans can pass and AIs can't? And if there are humans who can't pass it, do we consider then unintelligent?

We don't even need to set the bar that high. Is there some definition of "understanding" that humans meet and AIs don't?
G This user is from outside of this forum
G This user is from outside of this forum
gaja@lemm.ee

schrieb zuletzt editiert von

#249

It's literally in the phrase "statically optimized." This is like arguing for your preferred deity. It'll never be proven but we have evidence to make our own conclusions. As it is now, AI doesn't learn or understand the same way humans do.
N 1 Antwort Letzte Antwort

1
G gaja@lemm.ee

It's literally in the phrase "statically optimized." This is like arguing for your preferred deity. It'll never be proven but we have evidence to make our own conclusions. As it is now, AI doesn't learn or understand the same way humans do.
N This user is from outside of this forum
N This user is from outside of this forum
nednobbins@lemmy.zip

schrieb zuletzt editiert von

#250

So you’re confident that human learning involves “understanding” which is distinct from “statistical optimization”. Is this something you feel in your soul or can you define the difference?
G 1 Antwort Letzte Antwort

0

Anmelden zum Antworten

D

Amazon engineers and marketers were asked on Monday to volunteer their time to the company’s warehouses to assist with grocery delivery
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
45

1

334 Stimmen

45 Beiträge

277 Aufrufe

A

I am glad it hasn’t been hard for you. Pretty much everybody I know has moved to other states because of how bad the jobs are here. I would if I could afford it.
S

libxml2 Maintainer Ends Embargoed Vulnerability Reports, Citing Unsustainable Burden
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
4

55 Stimmen

4 Beiträge

28 Aufrufe

M

Tragedy of the commons? Everyone wants to use it, no one wants to put forward the resources to maintain it.
D

Album 'D11-04' Out Now
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

1 Stimmen

1 Beiträge

11 Aufrufe

Niemand hat geantwortet
F

The Trump Mobile T1 Phone looks both bad and impossible
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
42

1

139 Stimmen

42 Beiträge

214 Aufrufe

S

"Components" means in this case the phone and the sticker.
P

Salt Lake City, plans to implement AI-assisted 911 call triaging to handle ~30% of about 450K non-emergency calls per year
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
143

1

403 Stimmen

143 Beiträge

610 Aufrufe

M

If anyone ever tells you they can't hire enough of blank they are lying to you. People have been running excellent 911 service all over the country for longer than I've been alive maybe they should ask someone?
D

Chinese tech firms freeze AI tools in national crackdown on exam cheats
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

43 Stimmen

1 Beiträge

15 Aufrufe

Niemand hat geantwortet
P

Taiwan’s chip plants run on migrant workers. Job brokers run their lives
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
8

1

75 Stimmen

8 Beiträge

43 Aufrufe

L

Police: Arrest you for having an open beer in public Judge: sentences you to prison The PIC:
P

AI headphones translate multiple speakers at once, cloning their voices in 3D sound
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
18

1

54 Stimmen

18 Beiträge

90 Aufrufe

H

Though babble fish is a funny term, Douglas Adams named the creature "Babel fish", after the biblical story of the tower of Babel.