Skip to content

AI industry horrified to face largest copyright class action ever certified

Technology
98 55 0
  • People cheering for this have no idea of the consequence of their copyright-maximalist position.

    If using images, text, etc to train a model is copyright infringement then there will NO open models because open source model creators could not possibly obtain all of the licensing for every piece of written or visual media in the Common Crawl dataset, which is what most of these things are trained on.

    As it stands now, corporations don't have a monopoly on AI specifically because copyright doesn't apply to AI training. Everyone has access to Common Crawl and the other large, public, datasets made from crawling the public Internet and so anyone can train a model on their own without worrying about obtaining billions of different licenses from every single individual who has ever written a word or drawn a picture.

    If there is a ruling that training violates copyright then the only entities that could possibly afford to train LLMs or diffusion models are companies that own a large amount of copyrighted materials. Sure, one company will lose a lot of money and/or be destroyed, but the legal president would be set so that it is impossible for anyone that doesn't have billions of dollars to train AI.

    People are shortsightedly seeing this as a victory for artists or some other nonsense. It's not. This is a fight where large copyright holders (Disney and other large publishing companies) want to completely own the ability to train AI because they own most of the large stores of copyrighted material.

    If the copyright holders win this then the open source training material, like Common Crawl, would be completely unusable to train models in the US/the West because any person who has ever posted anything to the Internet in the last 25 years could simply sue for copyright infringement.

    Anybody can use copyrighted works under fair use for research, more so if your LLM model is open source (I would say this fair use should only actually apply if your model is open source...).
    You are wrong.

    We don't need to break copyright rights that protect us from corporations in this case, or also incidentally protect open source and libre software.

  • Distributed computing projects, large non-profits, people in the near future with much more powerful and cheaper hardware, governments which are interested in providing public services to their citizens, etc.

    Look at other large technology projects. The Human Genome Project spent $3 billion to sequence the first genome but now you can have it done for around $500. This cost reduction is due to the massive, combined effort of tens of thousands of independent scientists working on the same problem. It isn't something that would have happened if Purdue Pharma owned the sequencing process and required every scientist to purchase a license from them in order to do research.

    LLM and diffusion models are trained on the works of everyone who's ever been online. This work, generated by billions of human-hours, is stored in the Common Crawl datasets and is freely available to anyone who wants it. This data is both priceless and owned by everyone. We should not be cheering for a world where it is illegal to use this dataset that we all created and, instead, we are forced to license massive datasets from publishing companies.

    The amount of progress on these types of models would immediately stop, there would be 3-4 corporations would could afford the licenses. They would have a de facto monopoly on LLMs and could enshittify them without worry of competition.

    The world you're envisioning would only have paid licenses, who's to say we can't have a "free for non commercial purposes" license style for it all?

  • Let's go baby! The law is the law, and it applies to everybody

    If the "genie doesn't go back in the bottle", make him pay for what he's stealing.

    The law is not the law.
    I am the law.

    insert awesome guitar riff here

    Reference: https://youtu.be/Kl_sRb0uQ7A

  • This is the real concern. Copyright abuse has been rampant for a long time, and the only reason things like the Internet Archive are allowed to exist is because the copyright holders don't want to pick a fight they could potentially lose and lessen their hold on the IPs they're hoarding. The AI case is the perfect thing for them, because it's a very clear violation with a good amount of public support on their side, and winning will allow them to crack down even harder on all the things like the Internet Archive that should be fair use. AI is bad, but this fight won't benefit the public either way.

    I wouldn't even say AI is bad, i have currently Qwen 3 running on my own GPU giving me a course in RegEx and how to use it. It sometimes makes mistakes in the examples (we all know that chatbots are shit when it comes to the r's in strawberry), but i see it as "spot the error" type of training for me, and the instructions themself have been error free for now, since i do the lesson myself i can easily spot if something goes wrong.

    AI crammed into everything because venture capitalists try to see what sticks is probably the main reason public opinion of chatbots is bad, and i don't condone that too, but the technology itself has uses and is an impressive accomplishment.

    Same with image generation: i am shit at drawing, and i don't have the money to commission art if i want something specific, but i can generate what i want for myself.

    If the copyright side wins, we all might lose the option to run imagegen and llms on our own hardware, there will never be an open-source llm, and resources that are important to us all will come even more under fire than they are already. Copyright holders will be the new AI companies, and without competition the enshittification will instantly start.

  • Well, theft has never been the best foundation for a business, has it?

    While I completely agree that copyright terms are completely overblown, they are valid law that other people suffer under, so it is 100% fair to make them suffer the same. Or worse, as they all broke the law for commercial gain.

    Well, theft has never been the best foundation for a business, has it?

    History would suggest otherwise.

  • I wouldn't even say AI is bad, i have currently Qwen 3 running on my own GPU giving me a course in RegEx and how to use it. It sometimes makes mistakes in the examples (we all know that chatbots are shit when it comes to the r's in strawberry), but i see it as "spot the error" type of training for me, and the instructions themself have been error free for now, since i do the lesson myself i can easily spot if something goes wrong.

    AI crammed into everything because venture capitalists try to see what sticks is probably the main reason public opinion of chatbots is bad, and i don't condone that too, but the technology itself has uses and is an impressive accomplishment.

    Same with image generation: i am shit at drawing, and i don't have the money to commission art if i want something specific, but i can generate what i want for myself.

    If the copyright side wins, we all might lose the option to run imagegen and llms on our own hardware, there will never be an open-source llm, and resources that are important to us all will come even more under fire than they are already. Copyright holders will be the new AI companies, and without competition the enshittification will instantly start.

    What you see as "spot the error" type training, another person sees as absolute fact that they internalize and use to make decisions that impact the world. The internet gave rise to the golden age of conspiracy theories, which is having a major impact on the worsening political climate, and it's because the average user isn't able to differentiate information from disinformation. AI chatbots giving people the answer they're looking for rather than the truth is only going to compound the issue.

  • This post did not contain any content.

    Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

    And yet, despite 20 years of experience, the only side Ashley presents is the technologists' side.

  • This post did not contain any content.

    I hope LLMs and generative AI crash and burn.

  • I hope LLMs and generative AI crash and burn.

    I'm thinking, honestly, what if that's the planned purpose of this bubble.

    I'm explaining - those "AI"'s involve assembling large datasets and making them available, poisoning the Web, and creating demand for for a specific kind of hardware.

    When it bursts, not everything bursts.

    Suddenly there will be plenty of no longer required hardware usable for normal ML applications like face recognition, voice recognition, text analysis to identify its author, combat drones with target selection, all kinds of stuff. It will be dirt cheap, compared to its current price, as it was with Sun hardware after the dotcom crash.

    There still will be those datasets, that can be analyzed for plenty of purposes. Legal or not, they are already processed into usable and convenient state.

    There will be the Web covered with a great wall of China tall layer of AI slop.

    There will likely be a bankrupt nation which will have a lot of things failing due to that.

    And there will still be all the centralized services. Suppose on that day you go search something in Google, and there's only the Google summary present, no results list (or maybe even a results list, whatever, but suddenly weighed differently), saying that you've been owned by domestic enemies yadda-yadda and the patriotic corporations are implementing a popular state of emergency or something like that. You go to Facebook, and when you write something there, your messages are premoderated by an AI so that you'd not be able to god forbid say something wrong. An LLM might not be able to support a decent enough conversation, but to edit out things you say, or PGP keys you send, in real time without anything appearing strange - easily. Or to change some real person's style of speech to yours.

    Suppose all of not-degoogled Android installations start doing things like that, Amazon's logistics suddenly start working to support a putsch, Facebook and WhatsApp do what I described or just fail, Apple makes a presentation of a new, magnificent, ingenious, miraculous, patriotic change to a better system of government, maybe even with Johnny Ive as the speaker, and possibly does the same unnoticeable censorship, Microsoft pushes one malicious update 3 months earlier with a backdoor to all Windows installations doing the same, and commits its datacenters to the common effort, and let's just say it's possible that a similar thing is done by some Linux developer believing in an idea and some of the major distributions - don't need it doing much, just to provide a backdoor usable remotely.

    I don't list Twitter because honestly it doesn't seem to work well enough or have coverage good enough.

    So - this seems a pretty possible apocalypse scenario which does lead to a sudden installation of a dictatorial regime with all the necessary surveillance, planning, censorship and enforcement already being functioning systems.

    So - of course apocalypse scenarios were a normal thing in movies for many years and many times, but it's funny how the more plausible such become, the less often they are described in art.

  • This post did not contain any content.

    Fucking good!! Let the AI industry BURN!

  • IA doesn't make any money off the content. Not that LLM companies do, but that's what they'd want.

    And this is exactly the reason why I think the IA will be forced to close down while AI companies that trained their models on it will not only stay but be praised for preserving information in an ironic twist. Because one side does participate in capitalism and the other doesn’t. They will claim AI is transformative enough even when it isn’t because the overly rich invested too much money into the grift.

  • Ah yes. "Public Domain" == "Theft"

    Not everything is public domain, thief scum.

  • I propose that anyone defending themselves in court over AI stealing data must be represented exclusively by AI.

    That would be glorious. If the future of your company depends on the LLM keeping track of hundreds of details and drawing the right conclusions, it’s game over during the first day.

  • This post did not contain any content.

    Good!!! Let the AI industry fucking burn!!!

  • Not everything is public domain, thief scum.

    Do they even teach the constitution anymore?

  • Peripheral Intravenous (IV) Catheter Market

    Technology technology
    1
    2
    0 Stimmen
    1 Beiträge
    4 Aufrufe
    Niemand hat geantwortet
  • 17 Stimmen
    4 Beiträge
    12 Aufrufe
    Z
    That's because it's mostly blah, blah.
  • 44 Stimmen
    10 Beiträge
    57 Aufrufe
    muusemuuse@sh.itjust.worksM
    Hospitals would likely be fine with it. The health insurance industry would not though and would pressure the hospital to cut you off. It’s illegal. But they would do it anyway. You would need serious fuck you money to change this. And even then, probably a lot of Luigis too.
  • 1 Stimmen
    1 Beiträge
    16 Aufrufe
    Niemand hat geantwortet
  • 112 Stimmen
    2 Beiträge
    31 Aufrufe
    W
    ...the ruling stopped short of ordering the government to recover past messages that may already have been lost. How would somebody be meant to comply with an order to recover a message that has been deleted? Or is that the point? Can't comply and you're in contempt of court.
  • Acute Leukemia Burden Trends and Future Predictions

    Technology technology
    5
    1
    5 Stimmen
    5 Beiträge
    57 Aufrufe
    G
    Looks like the delay in 2011 was so big the data became available after the 2017 one
  • Selling Surveillance as Convenience

    Technology technology
    13
    1
    112 Stimmen
    13 Beiträge
    104 Aufrufe
    E
    Trying to get my peers to care about their own privacy is exhausting. I wish their choices don't effect me, but like this article states.. They do in the long run. I will remain stubborn and only compromise rather than give in.
  • Why doesn't Nvidia have more competition?

    Technology technology
    22
    1
    33 Stimmen
    22 Beiträge
    242 Aufrufe
    B
    It’s funny how the article asks the question, but completely fails to answer it. About 15 years ago, Nvidia discovered there was a demand for compute in datacenters that could be met with powerful GPU’s, and they were quick to respond to it, and they had the resources to focus on it strongly, because of their huge success and high profitability in the GPU market. AMD also saw the market, and wanted to pursue it, but just over a decade ago where it began to clearly show the high potential for profitability, AMD was near bankrupt, and was very hard pressed to finance developments on GPU and compute in datacenters. AMD really tried the best they could, and was moderately successful from a technology perspective, but Nvidia already had a head start, and the proprietary development system CUDA was already an established standard that was very hard to penetrate. Intel simply fumbled the ball from start to finish. After a decade of trying to push ARM down from having the mobile crown by far, investing billions or actually the equivalent of ARM’s total revenue. They never managed to catch up to ARM despite they had the better production process at the time. This was the main focus of Intel, and Intel believed that GPU would never be more than a niche product. So when intel tried to compete on compute for datacenters, they tried to do it with X86 chips, One of their most bold efforts was to build a monstrosity of a cluster of Celeron chips, which of course performed laughably bad compared to Nvidia! Because as it turns out, the way forward at least for now, is indeed the massively parralel compute capability of a GPU, which Nvidia has refined for decades, only with (inferior) competition from AMD. But despite the lack of competition, Nvidia did not slow down, in fact with increased profits, they only grew bolder in their efforts. Making it even harder to catch up. Now AMD has had more money to compete for a while, and they do have some decent compute units, but Nvidia remains ahead and the CUDA problem is still there, so for AMD to really compete with Nvidia, they have to be better to attract customers. That’s a very tall order against Nvidia that simply seems to never stop progressing. So the only other option for AMD is to sell a bit cheaper. Which I suppose they have to. AMD and Intel were the obvious competitors, everybody else is coming from even further behind. But if I had to make a bet, it would be on Huawei. Huawei has some crazy good developers, and Trump is basically forcing them to figure it out themselves, because he is blocking Huawei and China in general from using both AMD and Nvidia AI chips. And the chips will probably be made by Chinese SMIC, because they are also prevented from using advanced production in the west, most notably TSMC. China will prevail, because it’s become a national project, of both prestige and necessity, and they have a massive talent mass and resources, so nothing can stop it now. IMO USA would clearly have been better off allowing China to use American chips. Now China will soon compete directly on both production and design too.