AI industry horrified to face largest copyright class action ever certified
-
People cheering for this have no idea of the consequence of their copyright-maximalist position.
If using images, text, etc to train a model is copyright infringement then there will NO open models because open source model creators could not possibly obtain all of the licensing for every piece of written or visual media in the Common Crawl dataset, which is what most of these things are trained on.
As it stands now, corporations don't have a monopoly on AI specifically because copyright doesn't apply to AI training. Everyone has access to Common Crawl and the other large, public, datasets made from crawling the public Internet and so anyone can train a model on their own without worrying about obtaining billions of different licenses from every single individual who has ever written a word or drawn a picture.
If there is a ruling that training violates copyright then the only entities that could possibly afford to train LLMs or diffusion models are companies that own a large amount of copyrighted materials. Sure, one company will lose a lot of money and/or be destroyed, but the legal president would be set so that it is impossible for anyone that doesn't have billions of dollars to train AI.
People are shortsightedly seeing this as a victory for artists or some other nonsense. It's not. This is a fight where large copyright holders (Disney and other large publishing companies) want to completely own the ability to train AI because they own most of the large stores of copyrighted material.
If the copyright holders win this then the open source training material, like Common Crawl, would be completely unusable to train models in the US/the West because any person who has ever posted anything to the Internet in the last 25 years could simply sue for copyright infringement.
Anybody can use copyrighted works under fair use for research, more so if your LLM model is open source (I would say this fair use should only actually apply if your model is open source...).
You are wrong.We don't need to break copyright rights that protect us from corporations in this case, or also incidentally protect open source and libre software.
-
Distributed computing projects, large non-profits, people in the near future with much more powerful and cheaper hardware, governments which are interested in providing public services to their citizens, etc.
Look at other large technology projects. The Human Genome Project spent $3 billion to sequence the first genome but now you can have it done for around $500. This cost reduction is due to the massive, combined effort of tens of thousands of independent scientists working on the same problem. It isn't something that would have happened if Purdue Pharma owned the sequencing process and required every scientist to purchase a license from them in order to do research.
LLM and diffusion models are trained on the works of everyone who's ever been online. This work, generated by billions of human-hours, is stored in the Common Crawl datasets and is freely available to anyone who wants it. This data is both priceless and owned by everyone. We should not be cheering for a world where it is illegal to use this dataset that we all created and, instead, we are forced to license massive datasets from publishing companies.
The amount of progress on these types of models would immediately stop, there would be 3-4 corporations would could afford the licenses. They would have a de facto monopoly on LLMs and could enshittify them without worry of competition.
The world you're envisioning would only have paid licenses, who's to say we can't have a "free for non commercial purposes" license style for it all?
-
Let's go baby! The law is the law, and it applies to everybody
If the "genie doesn't go back in the bottle", make him pay for what he's stealing.
The law is not the law.
I am the law.insert awesome guitar riff here
Reference: https://youtu.be/Kl_sRb0uQ7A
-
This is the real concern. Copyright abuse has been rampant for a long time, and the only reason things like the Internet Archive are allowed to exist is because the copyright holders don't want to pick a fight they could potentially lose and lessen their hold on the IPs they're hoarding. The AI case is the perfect thing for them, because it's a very clear violation with a good amount of public support on their side, and winning will allow them to crack down even harder on all the things like the Internet Archive that should be fair use. AI is bad, but this fight won't benefit the public either way.
I wouldn't even say AI is bad, i have currently Qwen 3 running on my own GPU giving me a course in RegEx and how to use it. It sometimes makes mistakes in the examples (we all know that chatbots are shit when it comes to the r's in strawberry), but i see it as "spot the error" type of training for me, and the instructions themself have been error free for now, since i do the lesson myself i can easily spot if something goes wrong.
AI crammed into everything because venture capitalists try to see what sticks is probably the main reason public opinion of chatbots is bad, and i don't condone that too, but the technology itself has uses and is an impressive accomplishment.
Same with image generation: i am shit at drawing, and i don't have the money to commission art if i want something specific, but i can generate what i want for myself.
If the copyright side wins, we all might lose the option to run imagegen and llms on our own hardware, there will never be an open-source llm, and resources that are important to us all will come even more under fire than they are already. Copyright holders will be the new AI companies, and without competition the enshittification will instantly start.
-
Well, theft has never been the best foundation for a business, has it?
While I completely agree that copyright terms are completely overblown, they are valid law that other people suffer under, so it is 100% fair to make them suffer the same. Or worse, as they all broke the law for commercial gain.
Well, theft has never been the best foundation for a business, has it?
History would suggest otherwise.
-
I wouldn't even say AI is bad, i have currently Qwen 3 running on my own GPU giving me a course in RegEx and how to use it. It sometimes makes mistakes in the examples (we all know that chatbots are shit when it comes to the r's in strawberry), but i see it as "spot the error" type of training for me, and the instructions themself have been error free for now, since i do the lesson myself i can easily spot if something goes wrong.
AI crammed into everything because venture capitalists try to see what sticks is probably the main reason public opinion of chatbots is bad, and i don't condone that too, but the technology itself has uses and is an impressive accomplishment.
Same with image generation: i am shit at drawing, and i don't have the money to commission art if i want something specific, but i can generate what i want for myself.
If the copyright side wins, we all might lose the option to run imagegen and llms on our own hardware, there will never be an open-source llm, and resources that are important to us all will come even more under fire than they are already. Copyright holders will be the new AI companies, and without competition the enshittification will instantly start.
What you see as "spot the error" type training, another person sees as absolute fact that they internalize and use to make decisions that impact the world. The internet gave rise to the golden age of conspiracy theories, which is having a major impact on the worsening political climate, and it's because the average user isn't able to differentiate information from disinformation. AI chatbots giving people the answer they're looking for rather than the truth is only going to compound the issue.
-
This post did not contain any content.
Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.
And yet, despite 20 years of experience, the only side Ashley presents is the technologists' side.
-
This post did not contain any content.
I hope LLMs and generative AI crash and burn.
-
I hope LLMs and generative AI crash and burn.
I'm thinking, honestly, what if that's the planned purpose of this bubble.
I'm explaining - those "AI"'s involve assembling large datasets and making them available, poisoning the Web, and creating demand for for a specific kind of hardware.
When it bursts, not everything bursts.
Suddenly there will be plenty of no longer required hardware usable for normal ML applications like face recognition, voice recognition, text analysis to identify its author, combat drones with target selection, all kinds of stuff. It will be dirt cheap, compared to its current price, as it was with Sun hardware after the dotcom crash.
There still will be those datasets, that can be analyzed for plenty of purposes. Legal or not, they are already processed into usable and convenient state.
There will be the Web covered with a great wall of China tall layer of AI slop.
There will likely be a bankrupt nation which will have a lot of things failing due to that.
And there will still be all the centralized services. Suppose on that day you go search something in Google, and there's only the Google summary present, no results list (or maybe even a results list, whatever, but suddenly weighed differently), saying that you've been owned by domestic enemies yadda-yadda and the patriotic corporations are implementing a popular state of emergency or something like that. You go to Facebook, and when you write something there, your messages are premoderated by an AI so that you'd not be able to god forbid say something wrong. An LLM might not be able to support a decent enough conversation, but to edit out things you say, or PGP keys you send, in real time without anything appearing strange - easily. Or to change some real person's style of speech to yours.
Suppose all of not-degoogled Android installations start doing things like that, Amazon's logistics suddenly start working to support a putsch, Facebook and WhatsApp do what I described or just fail, Apple makes a presentation of a new, magnificent, ingenious, miraculous, patriotic change to a better system of government, maybe even with Johnny Ive as the speaker, and possibly does the same unnoticeable censorship, Microsoft pushes one malicious update 3 months earlier with a backdoor to all Windows installations doing the same, and commits its datacenters to the common effort, and let's just say it's possible that a similar thing is done by some Linux developer believing in an idea and some of the major distributions - don't need it doing much, just to provide a backdoor usable remotely.
I don't list Twitter because honestly it doesn't seem to work well enough or have coverage good enough.
So - this seems a pretty possible apocalypse scenario which does lead to a sudden installation of a dictatorial regime with all the necessary surveillance, planning, censorship and enforcement already being functioning systems.
So - of course apocalypse scenarios were a normal thing in movies for many years and many times, but it's funny how the more plausible such become, the less often they are described in art.
-
This post did not contain any content.
Fucking good!! Let the AI industry BURN!
-
IA doesn't make any money off the content. Not that LLM companies do, but that's what they'd want.
And this is exactly the reason why I think the IA will be forced to close down while AI companies that trained their models on it will not only stay but be praised for preserving information in an ironic twist. Because one side does participate in capitalism and the other doesn’t. They will claim AI is transformative enough even when it isn’t because the overly rich invested too much money into the grift.
-
Ah yes. "Public Domain" == "Theft"
Not everything is public domain, thief scum.
-
I propose that anyone defending themselves in court over AI stealing data must be represented exclusively by AI.
That would be glorious. If the future of your company depends on the LLM keeping track of hundreds of details and drawing the right conclusions, it’s game over during the first day.
-
This post did not contain any content.
Good!!! Let the AI industry fucking burn!!!
-
Not everything is public domain, thief scum.
Do they even teach the constitution anymore?