linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

AI industry horrified to face largest copyright class action ever certified

Technology

77 Beiträge 46 Kommentatoren 0 Aufrufe

S sugarcatdestroyer@lemmy.world

I just remembered the movie where the genie was released from the bottle of a real genie, he turned the world into chaos by freeing his own kind, and if it weren't for the power of the plot, I'm afraid people there would have become slaves or died out.

Although here it is already necessary to file a lawsuit for theft of the soul in the literal sense of the word.
H This user is from outside of this forum
H This user is from outside of this forum
hugenerd@lemmy.ca

schrieb zuletzt editiert von

#55

I remember that X-Files episode!
S 1 Antwort Letzte Antwort

0
D davriellelouna@lemmy.world

This post did not contain any content.
H This user is from outside of this forum
H This user is from outside of this forum
hugenerd@lemmy.ca

schrieb zuletzt editiert von

#56

Too late. The systems we are building as a species will soon become sentient. We'll have aliens right here, no UFOs required. Where the music comes from will no longer be relevant.
E 1 Antwort Letzte Antwort

3
H hugenerd@lemmy.ca

I remember that X-Files episode!
S This user is from outside of this forum
S This user is from outside of this forum
sugarcatdestroyer@lemmy.world

schrieb zuletzt editiert von

#57

Damn, what did you watch those masterpieces on? What kind of smoke were you sitting on then? Although I don't know what secret materials you're talking about. Maybe I watched something wrong... And what an episode?
1 Antwort Letzte Antwort

0
K kibiz0r@midwest.social

They don’t want copyright power to expand further. And I agree with them, despite hating AI vendors with a passion.

For an understanding of the collateral damage, check out How To Think About Scraping by Cory Doctorow.
W This user is from outside of this forum
W This user is from outside of this forum
westingham@sh.itjust.works

schrieb zuletzt editiert von

#58

Ahhh, it makes more sense now. Thank you!
1 Antwort Letzte Antwort

1
A a_wild_mimic_appears@lemmy.dbzer0.com

Do you think that would rescue the IA from the type of people who made the IA already pull 300k books?
M This user is from outside of this forum
M This user is from outside of this forum
magikmw@piefed.social

schrieb zuletzt editiert von

#59

No. But going after LLMs wont make the situation for IA any worse, not directly anyway.
A 1 Antwort Letzte Antwort

0
H hugenerd@lemmy.ca

Too late. The systems we are building as a species will soon become sentient. We'll have aliens right here, no UFOs required. Where the music comes from will no longer be relevant.
E This user is from outside of this forum
E This user is from outside of this forum
explodicle@sh.itjust.works

schrieb zuletzt editiert von

#60

Ok perfect so since AGI is right around the corner and this is all irrelevant, then I'm sure the AI companies won't mind paying up.
H 1 Antwort Letzte Antwort

1
S sugarcatdestroyer@lemmy.world

Unfortunately, this will probably lead to nothing: in our world, only the poor seem to be punished for stealing. Well, corporations always get away with everything, so we sit on the couch and shout "YES!!!" for the fact that they are trying to console us with this.
M This user is from outside of this forum
M This user is from outside of this forum
modern_medicine_isnt@lemmy.world

schrieb zuletzt editiert von

#61

This issue is not so cut and dry. The AI companies are stealing from other companies more than ftom individual people. Publishing companies are owned by some very rich people. And they want thier cut.

This case may have started out with authors, but it is mentioned that it could turn into publishing companies vs AI companies.
1 Antwort Letzte Antwort

2
M magikmw@piefed.social

No. But going after LLMs wont make the situation for IA any worse, not directly anyway.
A This user is from outside of this forum
A This user is from outside of this forum
a_wild_mimic_appears@lemmy.dbzer0.com

schrieb zuletzt editiert von

#62

if the courts decide that scraping is illegal, IA can close up shop.
1 Antwort Letzte Antwort

1
E explodicle@sh.itjust.works

Ok perfect so since AGI is right around the corner and this is all irrelevant, then I'm sure the AI companies won't mind paying up.
H This user is from outside of this forum
H This user is from outside of this forum
hugenerd@lemmy.ca

schrieb zuletzt editiert von

#63

That's not the way it works. Do you think the Roman Empire just picked a particular Tuesday to collapse? It's a process and will take a while.
1 Antwort Letzte Antwort

0
D davriellelouna@lemmy.world

This post did not contain any content.
F This user is from outside of this forum
F This user is from outside of this forum
fauxliving@lemmy.world

schrieb zuletzt editiert von fauxliving@lemmy.world

#64

People cheering for this have no idea of the consequence of their copyright-maximalist position.

If using images, text, etc to train a model is copyright infringement then there will NO open models because open source model creators could not possibly obtain all of the licensing for every piece of written or visual media in the Common Crawl dataset, which is what most of these things are trained on.

As it stands now, corporations don't have a monopoly on AI specifically because copyright doesn't apply to AI training. Everyone has access to Common Crawl and the other large, public, datasets made from crawling the public Internet and so anyone can train a model on their own without worrying about obtaining billions of different licenses from every single individual who has ever written a word or drawn a picture.

If there is a ruling that training violates copyright then the only entities that could possibly afford to train LLMs or diffusion models are companies that own a large amount of copyrighted materials. Sure, one company will lose a lot of money and/or be destroyed, but the legal president would be set so that it is impossible for anyone that doesn't have billions of dollars to train AI.

People are shortsightedly seeing this as a victory for artists or some other nonsense. It's not. This is a fight where large copyright holders (Disney and other large publishing companies) want to completely own the ability to train AI because they own most of the large stores of copyrighted material.

If the copyright holders win this then the open source training material, like Common Crawl, would be completely unusable to train models in the US/the West because any person who has ever posted anything to the Internet in the last 25 years could simply sue for copyright infringement.
J L S 3 Antworten Letzte Antwort

9
Z zetta@mander.xyz

The law absolutely does not apply to everybody, and you are well aware of that.
A This user is from outside of this forum
A This user is from outside of this forum
astralpath@lemmy.ca

schrieb zuletzt editiert von

#65

Shouldn't it?
1 Antwort Letzte Antwort

0
J jason2357@lemmy.ca

Take scraping. Companies like Clearview will tell you that scraping is legal under copyright law. They’ll tell you that training a model with scraped data is also not a copyright infringement. They’re right.

I love Cory's writing, but while he does a masterful job of defending scraping, and makes a good argument that in most cases, it's laws other than Copyright that should be the battleground, he does, kinda, trip over the main point.

That is that training models on creative works and then selling access to the derivative "creative" works that those models output very much falls within the domain of copyright - on either side of a grey line we usually call "fair use" that hasn't been really tested in courts.

Lets take two absurd extremes to make the point. Say I train an LLM directly on Marvel movies, and then sell movies (or maybe movie scripts) that are almost identical to existing Marvel movies (maybe with a few key names and features altered). I don't think anyone would argue that is not a derivative work, or that falls under "fair use." However, if I used literature to train my LLM to be able to read, and used that to read street signs for my self-driving car, well, yeah, that might be something you could argue is "fair use" to sell. It's not producing copy-cat literature.

I agree with Cory that scraping, per se, is absolutely fine, and even re-distributing the results in some ways that are in the public interest or fall under "fair use", but it's hard to justify the slop machines as not a copyright problem.

In the end, yeah, fuck both sides anyway. Copyright was extended too far and used for far too much, and the AI companies are absolute thieves. I have no illusions this type of court case will do anything more than shift wealth from one robber-barron to another, and won't help artists and authors.
F This user is from outside of this forum
F This user is from outside of this forum
fauxliving@lemmy.world

schrieb zuletzt editiert von

#66

Say I train an LLM directly on Marvel movies, and then sell movies (or maybe movie scripts) that are almost identical to existing Marvel movies (maybe with a few key names and features altered). I don’t think anyone would argue that is not a derivative work, or that falls under “fair use.”

I think you're failing to differentiate between a work, which is protected by copyright, vs a tool which is not affected by copyright.

Say I use Photoshop and Adobe Premiere to create a script and movie which are almost identical to existing Marvel movies. I don't think anyone would argue that is not a derivative work, or that falls under "fair use".

The important part here is that the subject of this sentence is 'a work which has been created which is substantially similar to an existing copyrighted work'. This situation is already covered by copyright law. If a person draws a Mickey Mouse and tries to sell it then Disney will sue them, not their pencil.

Specific works are copyrighted and copyright laws create a civil liability for a person who creates works that are substantially similar to a copyrighted work.

Copyright doesn't allow publishers to go after Adobe because a person used Photoshop to make a fake Disney poster. This is why things like Bittorrent can legally exist despite being used primarily for copyright violation. Copyright laws apply to people and the works that they create.

A generated Marvel movie is substantially similar to a copyrighted Marvel movie and so copyright law protects it. A diffusion model is not substantially similar to any copyrighted work by Disney and so copyright laws don't apply here.
G 1 Antwort Letzte Antwort

1
Z zetta@mander.xyz

The law absolutely does not apply to everybody, and you are well aware of that.
J This user is from outside of this forum
J This user is from outside of this forum
jsomae@lemmy.ml

schrieb zuletzt editiert von

#67

The law applies to everybody, but the law-makers change the laws to benefit certain people. And then trump pardons the rest lol.
1 Antwort Letzte Antwort

0
D davriellelouna@lemmy.world

This post did not contain any content.
J This user is from outside of this forum
J This user is from outside of this forum
jsomae@lemmy.ml

schrieb zuletzt editiert von

#68

Would really love to see IP law get taken down a notch out of this.
1 Antwort Letzte Antwort

2
F fauxliving@lemmy.world

Say I train an LLM directly on Marvel movies, and then sell movies (or maybe movie scripts) that are almost identical to existing Marvel movies (maybe with a few key names and features altered). I don’t think anyone would argue that is not a derivative work, or that falls under “fair use.”

I think you're failing to differentiate between a work, which is protected by copyright, vs a tool which is not affected by copyright.

Say I use Photoshop and Adobe Premiere to create a script and movie which are almost identical to existing Marvel movies. I don't think anyone would argue that is not a derivative work, or that falls under "fair use".

The important part here is that the subject of this sentence is 'a work which has been created which is substantially similar to an existing copyrighted work'. This situation is already covered by copyright law. If a person draws a Mickey Mouse and tries to sell it then Disney will sue them, not their pencil.

Specific works are copyrighted and copyright laws create a civil liability for a person who creates works that are substantially similar to a copyrighted work.

Copyright doesn't allow publishers to go after Adobe because a person used Photoshop to make a fake Disney poster. This is why things like Bittorrent can legally exist despite being used primarily for copyright violation. Copyright laws apply to people and the works that they create.

A generated Marvel movie is substantially similar to a copyrighted Marvel movie and so copyright law protects it. A diffusion model is not substantially similar to any copyrighted work by Disney and so copyright laws don't apply here.
G This user is from outside of this forum
G This user is from outside of this forum
glog78@digitalcourage.social

schrieb zuletzt editiert von

#69

@FauxLiving @Jason2357
I take a bold stand on the whole topic:
I think AI is a big Scam ( pattern matching has nothing to do with !!! intelligence !!! ).
And this Scam might end as Dot-Com bubble in the late 90s ( https://en.wikipedia.org/wiki/Dot-com_bubble ) including the huge economic impact cause to many people have invested in an "idea" not in an proofen technology.
And as the Dot-Com bubble once the AI bubble has been cleaned up Machine Learning and Vector Databases will stay forever ( maybe some other part of the tech ).
Both don't need copyright changes cause they will never try to be one solution for everything. Like a small model to transform text to speech ... like a small model to translate ... like a full text search using a vector db to index all local documents ...
Like a small tool to sumarize text.
1 Antwort Letzte Antwort

0
F fauxliving@lemmy.world

People cheering for this have no idea of the consequence of their copyright-maximalist position.

If using images, text, etc to train a model is copyright infringement then there will NO open models because open source model creators could not possibly obtain all of the licensing for every piece of written or visual media in the Common Crawl dataset, which is what most of these things are trained on.

As it stands now, corporations don't have a monopoly on AI specifically because copyright doesn't apply to AI training. Everyone has access to Common Crawl and the other large, public, datasets made from crawling the public Internet and so anyone can train a model on their own without worrying about obtaining billions of different licenses from every single individual who has ever written a word or drawn a picture.

If there is a ruling that training violates copyright then the only entities that could possibly afford to train LLMs or diffusion models are companies that own a large amount of copyrighted materials. Sure, one company will lose a lot of money and/or be destroyed, but the legal president would be set so that it is impossible for anyone that doesn't have billions of dollars to train AI.

People are shortsightedly seeing this as a victory for artists or some other nonsense. It's not. This is a fight where large copyright holders (Disney and other large publishing companies) want to completely own the ability to train AI because they own most of the large stores of copyrighted material.

If the copyright holders win this then the open source training material, like Common Crawl, would be completely unusable to train models in the US/the West because any person who has ever posted anything to the Internet in the last 25 years could simply sue for copyright infringement.
J This user is from outside of this forum
J This user is from outside of this forum
justaraccoon@lemmy.world

schrieb zuletzt editiert von

#70

In theory sure, but in practice who has the resources to do large scale model training on huge datasets other than large corporations?
F 1 Antwort Letzte Antwort

3
A a_wild_mimic_appears@lemmy.dbzer0.com

But it would also mean that the Internet Archive is illegal, even tho they don't profit, but if scraping the internet is a copyright violation, then they are as guilty as Anthropic.
U This user is from outside of this forum
U This user is from outside of this forum
umbrella@lemmy.ml

schrieb zuletzt editiert von

#71

i say move it out of the us
1 Antwort Letzte Antwort

0
J justaraccoon@lemmy.world

In theory sure, but in practice who has the resources to do large scale model training on huge datasets other than large corporations?
F This user is from outside of this forum
F This user is from outside of this forum
fauxliving@lemmy.world

schrieb zuletzt editiert von fauxliving@lemmy.world

#72

Distributed computing projects, large non-profits, people in the near future with much more powerful and cheaper hardware, governments which are interested in providing public services to their citizens, etc.

Look at other large technology projects. The Human Genome Project spent $3 billion to sequence the first genome but now you can have it done for around $500. This cost reduction is due to the massive, combined effort of tens of thousands of independent scientists working on the same problem. It isn't something that would have happened if Purdue Pharma owned the sequencing process and required every scientist to purchase a license from them in order to do research.

LLM and diffusion models are trained on the works of everyone who's ever been online. This work, generated by billions of human-hours, is stored in the Common Crawl datasets and is freely available to anyone who wants it. This data is both priceless and owned by everyone. We should not be cheering for a world where it is illegal to use this dataset that we all created and, instead, we are forced to license massive datasets from publishing companies.

The amount of progress on these types of models would immediately stop, there would be 3-4 corporations would could afford the licenses. They would have a de facto monopoly on LLMs and could enshittify them without worry of competition.
1 Antwort Letzte Antwort

2
F fauxliving@lemmy.world

People cheering for this have no idea of the consequence of their copyright-maximalist position.

If using images, text, etc to train a model is copyright infringement then there will NO open models because open source model creators could not possibly obtain all of the licensing for every piece of written or visual media in the Common Crawl dataset, which is what most of these things are trained on.

As it stands now, corporations don't have a monopoly on AI specifically because copyright doesn't apply to AI training. Everyone has access to Common Crawl and the other large, public, datasets made from crawling the public Internet and so anyone can train a model on their own without worrying about obtaining billions of different licenses from every single individual who has ever written a word or drawn a picture.

If there is a ruling that training violates copyright then the only entities that could possibly afford to train LLMs or diffusion models are companies that own a large amount of copyrighted materials. Sure, one company will lose a lot of money and/or be destroyed, but the legal president would be set so that it is impossible for anyone that doesn't have billions of dollars to train AI.

People are shortsightedly seeing this as a victory for artists or some other nonsense. It's not. This is a fight where large copyright holders (Disney and other large publishing companies) want to completely own the ability to train AI because they own most of the large stores of copyrighted material.

If the copyright holders win this then the open source training material, like Common Crawl, would be completely unusable to train models in the US/the West because any person who has ever posted anything to the Internet in the last 25 years could simply sue for copyright infringement.
L This user is from outside of this forum
L This user is from outside of this forum
lustyargonianmana@lemmy.world

schrieb zuletzt editiert von

#73

Copyright is a leftover mechanism from slavery and it will be interesting to see how it gets challenged here, given that the wealthy view AI as an extension of themselves and not as a normal employee. Genuinely think the copyright cases from AI will be huge.
F 1 Antwort Letzte Antwort

2
S smoogs@lemmy.world

And you’re just crying that you can’t steal.
R This user is from outside of this forum
R This user is from outside of this forum
rivalarrival@lemmy.today

schrieb zuletzt editiert von

#74

Ah yes. "Public Domain" == "Theft"
1 Antwort Letzte Antwort

0

Anmelden zum Antworten

?

Ultimate IPTV Guide 2025: Everything You Need to Know About Modern Streaming
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

8 Aufrufe

Niemand hat geantwortet
R

FTC’s click-to-cancel rule has been struck down by federal judges at the eleventh hour
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
30

1

427 Stimmen

30 Beiträge

341 Aufrufe

S

Every single opportunity, however petty, to ensure we become more miserable evwry day.
P

My Honest Experience: Perodua vs Proton – Which One Truly Offers Better Value?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

2

0 Stimmen

1 Beiträge

17 Aufrufe

Niemand hat geantwortet
J

Iran Disables GPS, Joins China’s Beidou — The End of U.S. Satellite Dominance? [19:23 | JUN 28 2025 | GVS Deep Dive]
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
16

28 Stimmen

16 Beiträge

165 Aufrufe

D

The writing in this story is not accurate. Iran isn't turning it off for the country. They are talking about switching government services to use receivers that use Beidou as primary source of timing and maybe selectively turn off using GPS on those devices.
P

In North Carolina, Exploding Bulbs and Fridges on the Fritz Reveal a Town’s Fraying Electric System
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
5

1

57 Stimmen

5 Beiträge

65 Aufrufe

S

Imbezzled. Money was used to pay for somebody's vacation.
S

FCC commissioner writes op-ed titled, “It’s time for Trump to DOGE the FCC“
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
43

1

342 Stimmen

43 Beiträge

645 Aufrufe

G

highly recommend using containerized torrents through a VPN. I have transmission and openvpn containers. when the network goes down transmission can't connect since it's networked through the ovpn container. once the vpn is restored, everything restarts and resumes where it left off. ever since I've had this setup running, I haven't had a nastygram sent to me.
A

X blocks 8,000 accounts in India under government order
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1

58 Stimmen

2 Beiträge

32 Aufrufe

G

'member Aug 6 2024: https://www.ft.com/content/31919b4e-4a5a-4eba-ada7-88d3fec455f8 ;D UK faces resistance from X over taking down disinformation during riots Social media site owner Elon Musk has also been posting jibes at UK Prime Minister Keir Starmer Waiting to see those jibes at Modi... And who could forget in April 11, 2024: https://apnews.com/article/brazil-musk-x-twitter-moraes-bef06c0dbbb8ed87495b1afbb0edf211 What to know about Elon Musk’s ‘free speech’ feud with a Brazilian judge gotta see that feud with Indian judges, nobody asked him to block 8000 accounts, including western media outlets, whatever is he gonna do?
H

CrowdStrike Announces Layoffs Affecting 500 Employees
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
8

1

242 Stimmen

8 Beiträge

71 Aufrufe

S

This is where the magic of near meaningless corpo-babble comes in. The layoffs are part of a plan to aspirationally acheive the goal of $10b revenue by EoY 2025. What they are actually doing is a significant restructuring of the company, refocusing by outside hiring some amount of new people to lead or be a part of departments or positions that haven't existed before, or are being refocused to other priorities... ... But this process also involves laying off 500 of the 'least productive' or 'least mission critical' employees. So, technically, they can, and are, arguing that their new organizational paradigm will be so succesful that it actually will result in increased revenue, not just lower expenses. Generally corpos call this something like 'right-sizing' or 'refocusing' or something like that. ... But of course... anyone with any actual experience with working at a place that does this... will tell you roughly this is what happens: Turns out all those 'grunts' you let go of, well they actually do a lot more work in a bunch of weird, esoteric, bandaid solutions to keep everything going, than upper management was aware of... because middle management doesn't acknowledge or often even understand that that work was being done, because they are generally self-aggrandizing narcissist petty tyrants who spend more time in meetings fluffing themselves up than actually doing any useful management. Then, also, you are now bringing on new, outside people who look great on paper, to lead new or modified apartments... but they of course also do not have any institutional knowledge, as they are new. So now, you have a whole bunch of undocumented work that was being done, processes which were being followed... which is no longer being done, which is not documented.... and the new guys, even if they have the best intentions, now have to spend a quarter or two or three figuring out just exactly how much pre-existing middle management has been bullshitting about, figuring out just how much things do not actually function as they ssid it did... So now your efficiency improving restructuring is actually a chaotic mess. ... Now, this 'right sizing' is not always apocalyptically extremely bad, but it is also essentially never totally free from hiccups... and it increases stress, workload, and tensions between basically everyone at the company, to some extent. Here's Forbes explanation of this phenomenon, if you prefer an explanation of right sizing in corpospeak: https://www.forbes.com/advisor/business/rightsizing/