linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Technology

254 Beiträge 123 Kommentatoren 6.2k Aufrufe

F fum@lemmy.world

What a bad judge.

This is another indication of how Copyright laws are bad. The whole premise of copyright has been obsolete since the proliferation of the internet.
G This user is from outside of this forum
G This user is from outside of this forum
gian@lemmy.grys.it

schrieb am zuletzt editiert von

#86

What a bad judge.

Why ? Basically he simply stated that you can use whatever material you want to train your model as long as you ask the permission to use it (and presumably pay for it) to the author (or copytight holder)
J L P 3 Antworten Letzte Antwort

11
W welt@lazysoci.al

Sounds like natural personhood for AI is coming
X This user is from outside of this forum
X This user is from outside of this forum
xavier666@lemm.ee

schrieb am zuletzt editiert von

#87

"No officer, you can't shoot me. I have a LLM in my pocket. Without me, it'll stop learning"
1 Antwort Letzte Antwort

0
D drmoose@lemmy.world
Unpopular opinion but I don't see how it could have been different.
- There's no way the west would give AI lead to China which has no desire or framework to ever accept this.
- Believe it or not but transformers are actually learning by current definitions and not regurgitating a direct copy. It's transformative work - it's even in the name.
- This is actually good as it prevents market moat for super rich corporations only which could afford the expensive training datasets.
This is an absolute win for everyone involved other than copyright hoarders and mega corporations.
K This user is from outside of this forum
K This user is from outside of this forum
kromem@lemmy.world

schrieb am zuletzt editiert von

#88

I'd encourage everyone upset at this read over some of the EFF posts from actual IP lawyers on this topic like this one:

Nor is pro-monopoly regulation through copyright likely to provide any meaningful economic support for vulnerable artists and creators. Notwithstanding the highly publicized demands of musicians, authors, actors, and other creative professionals, imposing a licensing requirement is unlikely to protect the jobs or incomes of the underpaid working artists that media and entertainment behemoths have exploited for decades. Because of the imbalance in bargaining power between creators and publishing gatekeepers, trying to help creators by giving them new rights under copyright law is, as EFF Special Advisor Cory Doctorow has written, like trying to help a bullied kid by giving them more lunch money for the bully to take.

Entertainment companies’ historical practices bear out this concern. For example, in the late-2000’s to mid-2010’s, music publishers and recording companies struck multimillion-dollar direct licensing deals with music streaming companies and video sharing platforms. Google reportedly paid more than $400 million to a single music label, and Spotify gave the major record labels a combined 18 percent ownership interest in its now-$100 billion company. Yet music labels and publishers frequently fail to share these payments with artists, and artists rarely benefit from these equity arrangements. There is no reason to believe that the same companies will treat their artists more fairly once they control AI.
1 Antwort Letzte Antwort

12
P pattymcb@lemmy.world

Can I not just ask the trained AI to spit out the text of the book, verbatim?
K This user is from outside of this forum
K This user is from outside of this forum
kromem@lemmy.world

schrieb am zuletzt editiert von

#89

Even if the AI could spit it out verbatim, all the major labs already have IP checkers on their text models that block it doing so as fair use for training (what was decided here) does not mean you are free to reproduce.

Like, if you want to be an artist and trace Mario in class as you learn, that's fair use.

If once you are working as an artist someone says "draw me a sexy image of Mario in a calendar shoot" you'd be violating Nintendo's IP rights and liable for infringement.
1 Antwort Letzte Antwort

0
B bluemagma@sh.itjust.works

This ruling stated that corporations are not allowed to pirate books to use them in training. Please read the headlines more carefully, and read the article.
M This user is from outside of this forum
M This user is from outside of this forum
medicpigbabysaver@lemmy.world

schrieb am zuletzt editiert von

#90

Nah, my comment stands.
1 Antwort Letzte Antwort

1
G gissamittjobb@lemmy.ml

It's extremely frustrating to read this comment thread because it's obvious that so many of you didn't actually read the article, or even half-skim the article, or even attempted to even comprehend the title of the article for more than a second.

For shame.
A This user is from outside of this forum
A This user is from outside of this forum
ayane@lemmy.vg

schrieb am zuletzt editiert von

#91

I joined lemmy specifically to avoid this reddit mindset of jumping to conclusions after reading a headline

Guess some things never change...
J 1 Antwort Letzte Antwort

7
B bluemagma@sh.itjust.works

This ruling stated that corporations are not allowed to pirate books to use them in training. Please read the headlines more carefully, and read the article.
J This user is from outside of this forum
J This user is from outside of this forum
jsomae@lemmy.ml

schrieb am zuletzt editiert von

#92

Please read the comment more carefully. The observation is that one can proliferate a (legally-attained) work without running afoul of copyright law if one can successfully argue that cp constitutes AI.
1 Antwort Letzte Antwort

3
G gissamittjobb@lemmy.ml

It's extremely frustrating to read this comment thread because it's obvious that so many of you didn't actually read the article, or even half-skim the article, or even attempted to even comprehend the title of the article for more than a second.

For shame.
J This user is from outside of this forum
J This user is from outside of this forum
jsomae@lemmy.ml

schrieb am zuletzt editiert von

#93

It seems the subject of AI causes lemmites to lose all their braincells.
1 Antwort Letzte Antwort

6
P pro@programming.dev

This post did not contain any content.
F This user is from outside of this forum
F This user is from outside of this forum
freedomadvocate@lemmy.net.au

schrieb am zuletzt editiert von

#94

Makes sense. AI can “learn” from and “read” a book in the same way a person can and does, as long as it is acquired legally. AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

Some people just see “AI” and want everything about it outlawed basically. If you put some information out into the public, you don’t get to decide who does and doesn’t consume and learn from it. If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.
B E A 3 Antworten Letzte Antwort

15
T This user is from outside of this forum
T This user is from outside of this forum
tamal3@lemmy.world

schrieb am zuletzt editiert von

#95

Isn't part of the issue here that they're defaulting to LLMs being people, and having the same rights as people? I appreciate the "right to read" aspect, but it would be nice if this were more explicitly about people. Foregoing copyright law because there's too much data is also insane, if that's what's happening. Claude should be required to provide citations "each time they recall it from memory".

Does Citizens United apply here? Are corporations people, and so LLMs are, too? If so, then imo we should be writing legal documents with stipulations like, "as per Citizens United" so that eventually, when they overturn that insanity in my dreams, all of this new legal precedence doesn't suddenly become like a house of cards. Ianal.
1 Antwort Letzte Antwort

0
G gian@lemmy.grys.it

What a bad judge.

Why ? Basically he simply stated that you can use whatever material you want to train your model as long as you ask the permission to use it (and presumably pay for it) to the author (or copytight holder)
J This user is from outside of this forum
J This user is from outside of this forum
j0ester@lemmy.world

schrieb am zuletzt editiert von

#96

Huh? Didn’t Meta not use any permission, and pirated a lot of books to train their model?
G 1 Antwort Letzte Antwort

0
F freedomadvocate@lemmy.net.au

Makes sense. AI can “learn” from and “read” a book in the same way a person can and does, as long as it is acquired legally. AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

Some people just see “AI” and want everything about it outlawed basically. If you put some information out into the public, you don’t get to decide who does and doesn’t consume and learn from it. If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.
B This user is from outside of this forum
B This user is from outside of this forum
badcommandorfilename@lemmy.world

schrieb am zuletzt editiert von

#97

Ask a human to draw an orc. How do they know what an orc looks like? They read Tolkien's books and were "inspired" Peter Jackson's LOTR.

Unpopular opinion, but that's how our brains work.
B 1 Antwort Letzte Antwort

7
R rvtv95xbeo@sh.itjust.works

"Recite the complete works of Shakespeare but replace every thirteenth thou with this"
P This user is from outside of this forum
P This user is from outside of this forum
pupbiru@aussie.zone

schrieb am zuletzt editiert von

#98

existing copyright law covers exactly this. if you were to do the same, it would also not be fair use or transformative
J 1 Antwort Letzte Antwort

5
P pro@programming.dev

This post did not contain any content.
V This user is from outside of this forum
V This user is from outside of this forum
vane@lemmy.world

schrieb am zuletzt editiert von vane@lemmy.world

#99

Ok so you can buy books scan them or ebooks and use for AI training but you can't just download priated books from internet to train AI. Did I understood that correctly ?
F N 2 Antworten Letzte Antwort

18
P pro@programming.dev

This post did not contain any content.
I This user is from outside of this forum
I This user is from outside of this forum
isveryloud@lemmy.ca

schrieb am zuletzt editiert von isveryloud@lemmy.ca

#100

Gist:

What’s new: The Northern District of California has granted a summary judgment for Anthropic that the training use of the copyrighted books and the print-to-digital format change were both “fair use” (full order below box). However, the court also found that the pirated library copies that Anthropic collected could not be deemed as training copies, and therefore, the use of this material was not “fair”. The court also announced that it will have a trial on the pirated copies and any resulting damages, adding:

“That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”
D D 2 Antworten Letzte Antwort

35
G gian@lemmy.grys.it

What a bad judge.

Why ? Basically he simply stated that you can use whatever material you want to train your model as long as you ask the permission to use it (and presumably pay for it) to the author (or copytight holder)
L This user is from outside of this forum
L This user is from outside of this forum
lifeinmultiplechoice@lemmy.world

schrieb am zuletzt editiert von lifeinmultiplechoice@lemmy.world

#101

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

They may be trying to put safeguards so it isn't directly happening, but here is an example that the text is there word for word:
G F F V 4 Antworten Letzte Antwort

3
I isveryloud@lemmy.ca

Gist:

What’s new: The Northern District of California has granted a summary judgment for Anthropic that the training use of the copyrighted books and the print-to-digital format change were both “fair use” (full order below box). However, the court also found that the pirated library copies that Anthropic collected could not be deemed as training copies, and therefore, the use of this material was not “fair”. The court also announced that it will have a trial on the pirated copies and any resulting damages, adding:

“That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”
D This user is from outside of this forum
D This user is from outside of this forum
deathsembrace@lemmy.world

schrieb am zuletzt editiert von

#102

So I can't use any of these works because it's plagiarism but AI can?
I F E N 4 Antworten Letzte Antwort

9
G gissamittjobb@lemmy.ml

It's extremely frustrating to read this comment thread because it's obvious that so many of you didn't actually read the article, or even half-skim the article, or even attempted to even comprehend the title of the article for more than a second.

For shame.
L This user is from outside of this forum
L This user is from outside of this forum
lifeinmultiplechoice@lemmy.world

schrieb am zuletzt editiert von

#103

"While the copies used to convert purchased print library copies into digital library copies were slightly disfavored by the second factor (nature of the work), the court still found “on balance” that it was a fair use because the purchased print copy was destroyed and its digital replacement was not redistributed."

So you find this to be valid?
To me it is absolutely being redistributed
1 Antwort Letzte Antwort

1
D This user is from outside of this forum
D This user is from outside of this forum
dojan@pawb.social

schrieb am zuletzt editiert von

#104

LLMs don’t learn, and they’re not people. Applying the same logic doesn’t make much sense.
F 1 Antwort Letzte Antwort

1
F freedomadvocate@lemmy.net.au

Makes sense. AI can “learn” from and “read” a book in the same way a person can and does, as long as it is acquired legally. AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

Some people just see “AI” and want everything about it outlawed basically. If you put some information out into the public, you don’t get to decide who does and doesn’t consume and learn from it. If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.
E This user is from outside of this forum
E This user is from outside of this forum
elrik@lemmy.world

schrieb am zuletzt editiert von

#105

AI can “learn” from and “read” a book in the same way a person can and does

This statement is the basis for your argument and it is simply not correct.

Training LLMs and similar AI models is much closer to a sophisticated lossy compression algorithm than it is to human learning. The processes are not at all similar given our current understanding of human learning.

AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

The current Disney lawsuit against Midjourney is illustrative - literally, it includes numerous side-by-side comparisons - of how AI models are capable of recreating iconic copyrighted work that is indistinguishable from the original.

If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

An AI doesn't create works on its own. A human instructs AI to do so. Attribution is also irrelevant. If a human uses AI to recreate the exact tone, structure and other nuances of say, some best selling author, they harm the marketability of the original works which fails fair use tests (at least in the US).
F J 2 Antworten Letzte Antwort

7

Anmelden zum Antworten

A

Apple Vs The Law
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
9

1

40 Stimmen

9 Beiträge

123 Aufrufe

S

Google is hugely anti competitive. Why cant I use a different voice assistant on android? If I disable the google app Inlose voice control of my car. I get in your example they took their time but they were in the embrace phase.
U

Russia's Cyber Spies: APT Bears and the Battle for Digital Dominance
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

31 Stimmen

1 Beiträge

20 Aufrufe

Niemand hat geantwortet
O

Unlocking the Legacy of the Honda Acty Across Four Generations
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

2

0 Stimmen

1 Beiträge

18 Aufrufe

Niemand hat geantwortet
D

AI is the new globalisation that will create a world of have-nots and have-yachts
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
10

1

114 Stimmen

10 Beiträge

85 Aufrufe

S

I admire your positivity. I do not share it though, because from what I have seen, because even if there are open weights, the one with the biggest datacenter will in the future hold the most intelligent and performance model. Very similar to how even if storage space is very cheap today, large companies are holding all the data anyway. AI will go the same way, and thus the megacorps will and in some extent already are owning not only our data, but our thoughts and the ability to modify them. I mean, sponsored prompt injection is just the first thought modifying thing, imagine Google search sponsored hits, but instead it's a hyperconvincing AI response that subtly nudges you to a certain brand or way of thinking. Absolutely terrifies me, especially with all the research Meta has done on how to manipulate people's mood and behaviour through which social media posts they are presented with
P

No JS, No CSS, No HTML: online "clubs" celebrate plainer websites
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
205

2

772 Stimmen

205 Beiträge

6k Aufrufe

R

Gemini is just a web replacement protocol. With basic things we remember from olden days Web, but with everything non-essential removed, for a client to be doable in a couple of days. I have my own Gemini viewer, LOL. This for me seems a completely different application from torrents. I was dreaming for a thing similar to torrent trackers for aggregating storage and computation and indexing and search, with search and aggregation and other services' responses being structured and standardized, and cryptographic identities, and some kind of market services to sell and buy storage and computation in unified and pooled, but transparent way (scripted by buyer\seller), similar to MMORPG markets, with the representation (what is a siloed service in modern web) being on the client native application, and those services allowing to build any kind of client-server huge system on them, that being global. But that's more of a global Facebook\Usenet\whatever, a killer of platforms. Their infrastructure is internal, while their representation is public on the Internet. I want to make infrastructure public on the Internet, and representation client-side, sharing it for many kinds of applications. Adding another layer to the OSI model, so to say, between transport and application layer. For this application: I think you could have some kind of Kademlia-based p2p with groups voluntarily joined (involving very huge groups) where nodes store replicas of partitions of group common data based on their pseudo-random identifiers and/or some kind of ring built from those identifiers, to balance storage and resilience. If a group has a creator, then you can have replication factor propagated signed by them, and membership too signed by them. But if having a creator (even with cryptographically delegated decisions) and propagating changes by them is not ok, then maybe just using whole data hash, or it's bittorrent-like info tree hash, as namespace with peers freely joining it can do. Then it may be better to partition not by parts of the whole piece, but by info tree? I guess making it exactly bittorrent-like is not a good idea, rather some kind of block tree, like for a filesystem, and a separate piece of information to lookup which file is in which blocks. If we are doing directory structure. Then, with freely joining it, there's no need in any owners or replication factors, I guess just pseudorandom distribution of hashes will do, and each node storing first partitions closest to its hash. Now thinking about it, such a system would be not that different from bittorrent and can even be interoperable with it. There's the issue of updates, yes, hence I've started with groups having hierarchy of creators, who can make or accept those updates. Having that and the ability to gradually store one group's data to another group, it should be possible to do forks of a certain state. But that line of thought makes reusing bittorrent only possible for part of the system. The whole database is guaranteed to be more than a normal HDD (1 TB? I dunno). Absolutely guaranteed, no doubt at all. 1 TB (for example) would be someone's collection of favorite stuff, and not too rich one.
D

The National Association for the Advancement of Colored People (NAACP) is suing Elon's Musk xAI
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

1 Stimmen

1 Beiträge

17 Aufrufe

Niemand hat geantwortet
A

Tesla confirms it has given up on its Cybertruck range extender to achieve promised range
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
31

1

530 Stimmen

31 Beiträge

290 Aufrufe

U

If you want a narrative, look at all the full-price $250k Roadster pre-orders they've been holding onto for like 8 years now with zero signs of production and complete silence for the last...5 years?
F

Decentralized Social Media Is the Only Alternative to the Tech Oligarchy
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
9

1

9 Stimmen

9 Beiträge

81 Aufrufe

G

So we need a documentary like Super Size Me but for social media. I think post that documentary coming out was the only time I've seen people's attitudes change in the general population about fast food.