Skip to content

AI industry horrified to face largest copyright class action ever certified

Technology
89 50 0
  • Until they charge people to use their AI.

    It'll be just like today except that it will be illegal for any new companies to try and challenge the biggest players.

    why would I use their AI? on top of that, wouldn't it be in their best interests to allow people to use their AI with as few restrictions as possible in order to maximize market saturation?

  • This post did not contain any content.

    An important note here, the judge has already ruled in this case that "using Plaintiffs' works "to train specific LLMs [was] justified as a fair use" because "[t]he technology at issue was among the most transformative many of us will see in our lifetimes." during the summary judgement order.

    The plaintiffs are not suing Anthropic for infringing on their copyright, the court has already ruled that it was so obvious that they could not succeed with that argument that it could be dismissed. Their only remaining claim is that Anthropic downloaded the books from piracy sites using bittorrent

    This isn't about LLMs anymore, it's a standard "You downloaded something on Bittorrent and made a company mad"-type case that has been going on since Napster.

    Also, the headline is incredibly misleading. It's ascribing feelings to an entire industry based on a common legal filing that is not by itself noteworthy. Unless you really care about legal technicalities, you can stop here.


    The actual news, the new factual thing that happened, is that the Consumer Technology Association and the Computer and Communications Industry Association filed an Amicus Brief, in an appeal of an issue that Anthropic the court ruled against.

    This is pretty normal legal filing about legal technicalities. This isn't really newsworthy outside of, maybe, some people in the legal profession who are bored.

    The issue was class certification.

    Three people sued Anthropic. Instead of just suing Anthropic on behalf of themselves, they moved to be certified as class. That is to say that they wanted to sue on behalf of a larger group of people, in this case a "Pirated Books Class" of authors whose books Anthropic downloaded from the book piracy websites.

    The judge ruled they can represent the class, Anthropic appealed the ruling. During this appeal an industry group filed an Amicus brief with arguments supporting Anthropic's argument. This is not uncommon, The Onion famously filed an Amicus brief with the Supreme Court when they were about to rule on issues of parody. Like everything The Onion writes, it's a good piece of satire: https://www.supremecourt.gov/DocketPDF/22/22-293/242292/20221003125252896_35295545_1-22.10.03 - Novak-Parma - Onion Amicus Brief.pdf

  • Copyright is a leftover mechanism from slavery and it will be interesting to see how it gets challenged here, given that the wealthy view AI as an extension of themselves and not as a normal employee. Genuinely think the copyright cases from AI will be huge.

    My last comment was wrong, I've read through the filings of the case.

    The judge has already ruled that training the LLMs using the books was so obviously fair use that it was dismissed in summary judgement (my bolds):

    To summarize the analysis that now follows, the use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use under Section 107 of the Copyright Act. The digitization of the books purchased in print form by Anthropic was also a fair use, but not for the same reason as applies to the training copies. Instead, it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library with more convenient, space-saving, and searchable digital copies without adding new copies, creating new works, or redistributing existing copies. However, Anthropic had no entitlement to use pirated copies for its central library, and creating a permanent, general-purpose library was not itself a fair use excusing Anthropic's piracy.

    The only issue remaining in this case is that they downloaded copyrighted material with bittorrent, the kind of lawsuits that have been going on since napster. They'll probably be required to pay for all 196,640 books that they priated and some other damages.

  • Copyright owners winning the case maintains the status quo.

    The AI companies winning the case means anything leaked on the internet or even just hosted by a company can be used by anyone, including private photos and communication.

    Copyright owners are then the new AI companies, and compared to now where open source AI is a possibility, it will never be, because only they will have enough content to train models. And without any competition, enshittification will go full speed ahead, meaning the chatbots you don't like will still be there, and now they will try to sell you stuff and you can't even choose a chatbot that doesn't want to upsell you.

  • i say move it out of the us

    they should have done that long ago, and if they haven't already started a backup in both europe and china, it's high time

  • If scraping is illegal, so is the Internet Archive, and that would be an immense loss for the world.

    This is the real concern. Copyright abuse has been rampant for a long time, and the only reason things like the Internet Archive are allowed to exist is because the copyright holders don't want to pick a fight they could potentially lose and lessen their hold on the IPs they're hoarding. The AI case is the perfect thing for them, because it's a very clear violation with a good amount of public support on their side, and winning will allow them to crack down even harder on all the things like the Internet Archive that should be fair use. AI is bad, but this fight won't benefit the public either way.

  • People cheering for this have no idea of the consequence of their copyright-maximalist position.

    If using images, text, etc to train a model is copyright infringement then there will NO open models because open source model creators could not possibly obtain all of the licensing for every piece of written or visual media in the Common Crawl dataset, which is what most of these things are trained on.

    As it stands now, corporations don't have a monopoly on AI specifically because copyright doesn't apply to AI training. Everyone has access to Common Crawl and the other large, public, datasets made from crawling the public Internet and so anyone can train a model on their own without worrying about obtaining billions of different licenses from every single individual who has ever written a word or drawn a picture.

    If there is a ruling that training violates copyright then the only entities that could possibly afford to train LLMs or diffusion models are companies that own a large amount of copyrighted materials. Sure, one company will lose a lot of money and/or be destroyed, but the legal president would be set so that it is impossible for anyone that doesn't have billions of dollars to train AI.

    People are shortsightedly seeing this as a victory for artists or some other nonsense. It's not. This is a fight where large copyright holders (Disney and other large publishing companies) want to completely own the ability to train AI because they own most of the large stores of copyrighted material.

    If the copyright holders win this then the open source training material, like Common Crawl, would be completely unusable to train models in the US/the West because any person who has ever posted anything to the Internet in the last 25 years could simply sue for copyright infringement.

    Anybody can use copyrighted works under fair use for research, more so if your LLM model is open source (I would say this fair use should only actually apply if your model is open source...).
    You are wrong.

    We don't need to break copyright rights that protect us from corporations in this case, or also incidentally protect open source and libre software.

  • Distributed computing projects, large non-profits, people in the near future with much more powerful and cheaper hardware, governments which are interested in providing public services to their citizens, etc.

    Look at other large technology projects. The Human Genome Project spent $3 billion to sequence the first genome but now you can have it done for around $500. This cost reduction is due to the massive, combined effort of tens of thousands of independent scientists working on the same problem. It isn't something that would have happened if Purdue Pharma owned the sequencing process and required every scientist to purchase a license from them in order to do research.

    LLM and diffusion models are trained on the works of everyone who's ever been online. This work, generated by billions of human-hours, is stored in the Common Crawl datasets and is freely available to anyone who wants it. This data is both priceless and owned by everyone. We should not be cheering for a world where it is illegal to use this dataset that we all created and, instead, we are forced to license massive datasets from publishing companies.

    The amount of progress on these types of models would immediately stop, there would be 3-4 corporations would could afford the licenses. They would have a de facto monopoly on LLMs and could enshittify them without worry of competition.

    The world you're envisioning would only have paid licenses, who's to say we can't have a "free for non commercial purposes" license style for it all?

  • Let's go baby! The law is the law, and it applies to everybody

    If the "genie doesn't go back in the bottle", make him pay for what he's stealing.

    The law is not the law.
    I am the law.

    insert awesome guitar riff here

    Reference: https://youtu.be/Kl_sRb0uQ7A

  • This is the real concern. Copyright abuse has been rampant for a long time, and the only reason things like the Internet Archive are allowed to exist is because the copyright holders don't want to pick a fight they could potentially lose and lessen their hold on the IPs they're hoarding. The AI case is the perfect thing for them, because it's a very clear violation with a good amount of public support on their side, and winning will allow them to crack down even harder on all the things like the Internet Archive that should be fair use. AI is bad, but this fight won't benefit the public either way.

    I wouldn't even say AI is bad, i have currently Qwen 3 running on my own GPU giving me a course in RegEx and how to use it. It sometimes makes mistakes in the examples (we all know that chatbots are shit when it comes to the r's in strawberry), but i see it as "spot the error" type of training for me, and the instructions themself have been error free for now, since i do the lesson myself i can easily spot if something goes wrong.

    AI crammed into everything because venture capitalists try to see what sticks is probably the main reason public opinion of chatbots is bad, and i don't condone that too, but the technology itself has uses and is an impressive accomplishment.

    Same with image generation: i am shit at drawing, and i don't have the money to commission art if i want something specific, but i can generate what i want for myself.

    If the copyright side wins, we all might lose the option to run imagegen and llms on our own hardware, there will never be an open-source llm, and resources that are important to us all will come even more under fire than they are already. Copyright holders will be the new AI companies, and without competition the enshittification will instantly start.

  • Well, theft has never been the best foundation for a business, has it?

    While I completely agree that copyright terms are completely overblown, they are valid law that other people suffer under, so it is 100% fair to make them suffer the same. Or worse, as they all broke the law for commercial gain.

    Well, theft has never been the best foundation for a business, has it?

    History would suggest otherwise.

  • I wouldn't even say AI is bad, i have currently Qwen 3 running on my own GPU giving me a course in RegEx and how to use it. It sometimes makes mistakes in the examples (we all know that chatbots are shit when it comes to the r's in strawberry), but i see it as "spot the error" type of training for me, and the instructions themself have been error free for now, since i do the lesson myself i can easily spot if something goes wrong.

    AI crammed into everything because venture capitalists try to see what sticks is probably the main reason public opinion of chatbots is bad, and i don't condone that too, but the technology itself has uses and is an impressive accomplishment.

    Same with image generation: i am shit at drawing, and i don't have the money to commission art if i want something specific, but i can generate what i want for myself.

    If the copyright side wins, we all might lose the option to run imagegen and llms on our own hardware, there will never be an open-source llm, and resources that are important to us all will come even more under fire than they are already. Copyright holders will be the new AI companies, and without competition the enshittification will instantly start.

    What you see as "spot the error" type training, another person sees as absolute fact that they internalize and use to make decisions that impact the world. The internet gave rise to the golden age of conspiracy theories, which is having a major impact on the worsening political climate, and it's because the average user isn't able to differentiate information from disinformation. AI chatbots giving people the answer they're looking for rather than the truth is only going to compound the issue.

  • WhatsApp deletes over 6.8m accounts linked to scams, Meta says

    Technology technology
    24
    1
    120 Stimmen
    24 Beiträge
    20 Aufrufe
    T
    Have you tried been a spambot?
  • 511 Stimmen
    48 Beiträge
    121 Aufrufe
    I
    The shit was canceled.
  • 0 Stimmen
    1 Beiträge
    8 Aufrufe
    Niemand hat geantwortet
  • 109 Stimmen
    11 Beiträge
    57 Aufrufe
    th3dogcow@lemmy.worldT
    Reader view on most browsers will bypass articles like this. It worked for me.
  • Bill Gates and Linus Torvalds meet for the first time.

    Technology technology
    44
    2
    441 Stimmen
    44 Beiträge
    443 Aufrufe
    ?
    That must have taken some diplomacy, but it would have been even more impressive to have convinced Stallman to come too
  • 28 Stimmen
    7 Beiträge
    76 Aufrufe
    J
    Just keep in mind they are considered a crime in the US and can be located. Use with caution.
  • 11 Stimmen
    19 Beiträge
    149 Aufrufe
    E
    No, just laminated ones. Closed at one end. Easy enough to make or buy. You can even improvise the propellant.
  • 0 Stimmen
    4 Beiträge
    45 Aufrufe
    K
    Only way I'll want a different phone brand is if it comes with ZERO bloatware and has an excellent internal memory/storage cleanse that has nothing to do with Google's Files or a random app I'm not sure I can trust without paying or rooting. So far my A series phones do what I need mostly and in my opinion is superior to the Motorola's my fiancé prefers minus the phone-phone charge ability his has, everything else I'm just glad I have enough control to tweak things to my liking, however these days Samsungs seem to be infested with Google bloatware and apps that insist on opening themselves back up regardless of the widespread battery restrictions I've assigned (even was sent a "Stop Closing my Apps" notif that sent me to an article ) short of Disabling many unnecessary apps bc fully rooting my devices is something I rarely do anymore. I have a random Chinese brand tablet where I actually have more control over the apps than either of my A series phones whee Force Stopping STAYS that way when I tell them to! I hate being listened to for ads and the unwanted draining my battery life and data (I live off-grid and pay data rates because "Unlimited" is some throttled BS) so my ability to control what's going on in the background matters a lot to me, enough that I'm anti Meta-apps and avoid all non-essential Google apps. I can't afford topline phones and the largest data plan, so I work with what I can afford and I'm sad refurbished A lines seem to be getting more expensive while giving away my control to companies. Last A line I bought that was supposed to be my first 5G phone was network locked, so I got ripped off, but it still serves me well in off-grid life. Only app that actually regularly malfunctions when I Force Stop it's background presence is Roku, which I find to have very an almost insidious presence in our lives. Google Play, Chrome, and Spotify never acts incompetent in any way no matter how I have to open the setting every single time I turn Airplane Mode off. Don't need Gmail with Chrome and DuckDuckGo has been awesome at intercepting self-loading ads. I hope one day DDG gets better bc Google seems to be terrible lately and I even caught their AI contradicting itself when asking about if Homo Florensis is considered Human (yes) and then asked the oldest age of human remains, and was fed the outdated narrative of 300,000 years versus 700,000+ years bipedal pre-humans have been carbon dated outside of the Cradle of Humanity in South Africa. SO sorry to go off-topic, but I've got a big gripe with Samsung's partnership with Google, especially considering the launch of Quantum Computed AI that is still being fine-tuned with company-approved censorships.