Skip to content

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Technology
254 123 6.2k
  • What a bad judge.

    This is another indication of how Copyright laws are bad. The whole premise of copyright has been obsolete since the proliferation of the internet.

    What a bad judge.

    Why ? Basically he simply stated that you can use whatever material you want to train your model as long as you ask the permission to use it (and presumably pay for it) to the author (or copytight holder)

  • Sounds like natural personhood for AI is coming

    "No officer, you can't shoot me. I have a LLM in my pocket. Without me, it'll stop learning"

  • Unpopular opinion but I don't see how it could have been different.

    • There's no way the west would give AI lead to China which has no desire or framework to ever accept this.
    • Believe it or not but transformers are actually learning by current definitions and not regurgitating a direct copy. It's transformative work - it's even in the name.
    • This is actually good as it prevents market moat for super rich corporations only which could afford the expensive training datasets.

    This is an absolute win for everyone involved other than copyright hoarders and mega corporations.

    I'd encourage everyone upset at this read over some of the EFF posts from actual IP lawyers on this topic like this one:

    Nor is pro-monopoly regulation through copyright likely to provide any meaningful economic support for vulnerable artists and creators. Notwithstanding the highly publicized demands of musicians, authors, actors, and other creative professionals, imposing a licensing requirement is unlikely to protect the jobs or incomes of the underpaid working artists that media and entertainment behemoths have exploited for decades. Because of the imbalance in bargaining power between creators and publishing gatekeepers, trying to help creators by giving them new rights under copyright law is, as EFF Special Advisor Cory Doctorow has written, like trying to help a bullied kid by giving them more lunch money for the bully to take.

    Entertainment companies’ historical practices bear out this concern. For example, in the late-2000’s to mid-2010’s, music publishers and recording companies struck multimillion-dollar direct licensing deals with music streaming companies and video sharing platforms. Google reportedly paid more than $400 million to a single music label, and Spotify gave the major record labels a combined 18 percent ownership interest in its now-$100 billion company. Yet music labels and publishers frequently fail to share these payments with artists, and artists rarely benefit from these equity arrangements. There is no reason to believe that the same companies will treat their artists more fairly once they control AI.

  • Can I not just ask the trained AI to spit out the text of the book, verbatim?

    Even if the AI could spit it out verbatim, all the major labs already have IP checkers on their text models that block it doing so as fair use for training (what was decided here) does not mean you are free to reproduce.

    Like, if you want to be an artist and trace Mario in class as you learn, that's fair use.

    If once you are working as an artist someone says "draw me a sexy image of Mario in a calendar shoot" you'd be violating Nintendo's IP rights and liable for infringement.

  • This ruling stated that corporations are not allowed to pirate books to use them in training. Please read the headlines more carefully, and read the article.

    Nah, my comment stands.

  • It's extremely frustrating to read this comment thread because it's obvious that so many of you didn't actually read the article, or even half-skim the article, or even attempted to even comprehend the title of the article for more than a second.

    For shame.

    I joined lemmy specifically to avoid this reddit mindset of jumping to conclusions after reading a headline

    Guess some things never change...

  • This ruling stated that corporations are not allowed to pirate books to use them in training. Please read the headlines more carefully, and read the article.

    Please read the comment more carefully. The observation is that one can proliferate a (legally-attained) work without running afoul of copyright law if one can successfully argue that cp constitutes AI.

  • It's extremely frustrating to read this comment thread because it's obvious that so many of you didn't actually read the article, or even half-skim the article, or even attempted to even comprehend the title of the article for more than a second.

    For shame.

    It seems the subject of AI causes lemmites to lose all their braincells.

  • This post did not contain any content.

    Makes sense. AI can “learn” from and “read” a book in the same way a person can and does, as long as it is acquired legally. AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

    Some people just see “AI” and want everything about it outlawed basically. If you put some information out into the public, you don’t get to decide who does and doesn’t consume and learn from it. If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

  • Isn't part of the issue here that they're defaulting to LLMs being people, and having the same rights as people? I appreciate the "right to read" aspect, but it would be nice if this were more explicitly about people. Foregoing copyright law because there's too much data is also insane, if that's what's happening. Claude should be required to provide citations "each time they recall it from memory".

    Does Citizens United apply here? Are corporations people, and so LLMs are, too? If so, then imo we should be writing legal documents with stipulations like, "as per Citizens United" so that eventually, when they overturn that insanity in my dreams, all of this new legal precedence doesn't suddenly become like a house of cards. Ianal.

  • What a bad judge.

    Why ? Basically he simply stated that you can use whatever material you want to train your model as long as you ask the permission to use it (and presumably pay for it) to the author (or copytight holder)

    Huh? Didn’t Meta not use any permission, and pirated a lot of books to train their model?

  • Makes sense. AI can “learn” from and “read” a book in the same way a person can and does, as long as it is acquired legally. AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

    Some people just see “AI” and want everything about it outlawed basically. If you put some information out into the public, you don’t get to decide who does and doesn’t consume and learn from it. If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

    Ask a human to draw an orc. How do they know what an orc looks like? They read Tolkien's books and were "inspired" Peter Jackson's LOTR.

    Unpopular opinion, but that's how our brains work.

  • "Recite the complete works of Shakespeare but replace every thirteenth thou with this"

    existing copyright law covers exactly this. if you were to do the same, it would also not be fair use or transformative

  • This post did not contain any content.

    Ok so you can buy books scan them or ebooks and use for AI training but you can't just download priated books from internet to train AI. Did I understood that correctly ?

  • This post did not contain any content.

    Gist:

    What’s new: The Northern District of California has granted a summary judgment for Anthropic that the training use of the copyrighted books and the print-to-digital format change were both “fair use” (full order below box). However, the court also found that the pirated library copies that Anthropic collected could not be deemed as training copies, and therefore, the use of this material was not “fair”. The court also announced that it will have a trial on the pirated copies and any resulting damages, adding:

    “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”

  • What a bad judge.

    Why ? Basically he simply stated that you can use whatever material you want to train your model as long as you ask the permission to use it (and presumably pay for it) to the author (or copytight holder)

    If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

    They may be trying to put safeguards so it isn't directly happening, but here is an example that the text is there word for word:

  • Gist:

    What’s new: The Northern District of California has granted a summary judgment for Anthropic that the training use of the copyrighted books and the print-to-digital format change were both “fair use” (full order below box). However, the court also found that the pirated library copies that Anthropic collected could not be deemed as training copies, and therefore, the use of this material was not “fair”. The court also announced that it will have a trial on the pirated copies and any resulting damages, adding:

    “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”

    So I can't use any of these works because it's plagiarism but AI can?

  • It's extremely frustrating to read this comment thread because it's obvious that so many of you didn't actually read the article, or even half-skim the article, or even attempted to even comprehend the title of the article for more than a second.

    For shame.

    "While the copies used to convert purchased print library copies into digital library copies were slightly disfavored by the second factor (nature of the work), the court still found “on balance” that it was a fair use because the purchased print copy was destroyed and its digital replacement was not redistributed."

    So you find this to be valid?
    To me it is absolutely being redistributed

  • LLMs don’t learn, and they’re not people. Applying the same logic doesn’t make much sense.

  • Makes sense. AI can “learn” from and “read” a book in the same way a person can and does, as long as it is acquired legally. AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

    Some people just see “AI” and want everything about it outlawed basically. If you put some information out into the public, you don’t get to decide who does and doesn’t consume and learn from it. If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

    AI can “learn” from and “read” a book in the same way a person can and does

    This statement is the basis for your argument and it is simply not correct.

    Training LLMs and similar AI models is much closer to a sophisticated lossy compression algorithm than it is to human learning. The processes are not at all similar given our current understanding of human learning.

    AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

    The current Disney lawsuit against Midjourney is illustrative - literally, it includes numerous side-by-side comparisons - of how AI models are capable of recreating iconic copyrighted work that is indistinguishable from the original.

    If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

    An AI doesn't create works on its own. A human instructs AI to do so. Attribution is also irrelevant. If a human uses AI to recreate the exact tone, structure and other nuances of say, some best selling author, they harm the marketability of the original works which fails fair use tests (at least in the US).

  • Google tool misused to scrub tech CEO’s shady past from search

    Technology technology
    19
    1
    206 Stimmen
    19 Beiträge
    15 Aufrufe
    G
    Ok... Here's something you should know. What happened there was suppressing personal data from Google's search engine. In the EU, that is regarded as a fundamental human right. The "right to be forgotten" is exactly about hiding a shady past. The GDPR gives you the right to demand that Google must omit certain links when people search for your name. Google does comply. You don't need a court order or anything. So, you can't celebrate the GDPR while also condemning what happened here.
  • 607 Stimmen
    417 Beiträge
    13k Aufrufe
    I
    why do allow a romantic partner to set boundaries on the potential relationships I could form with others it also just hurt to imagine him being with someone else and preferring them over me My problem is exclusivity being the standard or default requirement for almost everyone, in many case just because that's what everyone else is doing. This deletes, say 95% of the population. It's already a very improbable thing to hook up with someone compatible and have that requirement, unless you have a very high "hook up attempt" rate than you can just forget the whole thing as unrealistic, which I did a long time ago. It's just not going to happen, no interested, the terms are unacceptable I'm not even going to waste any time trying.
  • DIY experimental Redox Flow Battery kit

    Technology technology
    3
    1
    37 Stimmen
    3 Beiträge
    42 Aufrufe
    C
    The roadmap defines 3 milestone batteries. The first is released, it's a benchtop device that you can relatively easily build on your own. It has an electrode side of 2 x 2cm2. It does not store any significant amount of energy. The second one is being developed right now, it has a cell the size of a small 3d printer bed (20x20cm) and will also not store practical amounts of energy. It will hopefully prove though that they are on the right track and that they can scale it up. The third battery only will store significant amounts of energy but in only due end of the year (probably later). Current Vanadium systems cost approx. 300-600$/kWh according to some random website I found. The goal of this project is to spread the knowledge about Redox Flow Batteries and in the medium term only make them commercially viable. The aniolyth and catholyth are based on the Zink-Iodine system in an aqueous solution. There are a bunch of other systems though, each with their trade offs. The anode and cathode are both graphite felt in the case of the dev kit.
  • This Is Why Tesla’s Robotaxi Launch Needed Human Babysitters

    Technology technology
    26
    1
    114 Stimmen
    26 Beiträge
    331 Aufrufe
    H
    Karel es hone
  • Browser Alternatives to Chrome

    Technology technology
    14
    11 Stimmen
    14 Beiträge
    136 Aufrufe
    L
    I've been using Vivaldi as my logged in browser for years. I like the double tab bar groups, session management, email client, sidebar and tab bar on mobile. It is strange to me that tab bar isn't a thing on mobile on other browsers despite phones having way more vertical space than computers. Although for internet searches I use a seperate lighter weight browser that clears its data on close. Ecosia also been using for years. For a while it was geniunely better than the other search engines I had tried but nowadays it's worse since it started to return google translate webpage translation links based on search region instead of the webpages themselves. Also not sure what to think about the counter they readded after removing it to reduce the emphasis on quantity over quality like a year ago. I don't use duckduckgo as its name and the way privacy communities used to obsess about it made me distrust it for some reason
  • 50 Stimmen
    9 Beiträge
    94 Aufrufe
    H
    Also fair
  • 79 Stimmen
    14 Beiträge
    139 Aufrufe
    A
    It was very boring.
  • Catbox.moe got screwed 😿

    Technology technology
    40
    55 Stimmen
    40 Beiträge
    420 Aufrufe
    archrecord@lemm.eeA
    I'll gladly give you a reason. I'm actually happy to articulate my stance on this, considering how much I tend to care about digital rights. Services that host files should not be held responsible for what users upload, unless: The service explicitly caters to illegal content by definition or practice (i.e. the if the website is literally titled uploadyourcsamhere[.]com then it's safe to assume they deliberately want to host illegal content) The service has a very easy mechanism to remove illegal content, either when asked, or through simple monitoring systems, but chooses not to do so (catbox does this, and quite quickly too) Because holding services responsible creates a whole host of negative effects. Here's some examples: Someone starts a CDN and some users upload CSAM. The creator of the CDN goes to jail now. Nobody ever wants to create a CDN because of the legal risk, and thus the only providers of CDNs become shady, expensive, anonymously-run services with no compliance mechanisms. You run a site that hosts images, and someone decides they want to harm you. They upload CSAM, then report the site to law enforcement. You go to jail. Anybody in the future who wants to run an image sharing site must now self-censor to try and not upset any human being that could be willing to harm them via their site. A social media site is hosting the posts and content of users. In order to be compliant and not go to jail, they must engage in extremely strict filtering, otherwise even one mistake could land them in jail. All users of the site are prohibited from posting any NSFW or even suggestive content, (including newsworthy media, such as an image of bodies in a warzone) and any violation leads to an instant ban, because any of those things could lead to a chance of actually illegal content being attached. This isn't just my opinion either. Digital rights organizations such as the Electronic Frontier Foundation have talked at length about similar policies before. To quote them: "When social media platforms adopt heavy-handed moderation policies, the unintended consequences can be hard to predict. For example, Twitter’s policies on sexual material have resulted in posts on sexual health and condoms being taken down. YouTube’s bans on violent content have resulted in journalism on the Syrian war being pulled from the site. It can be tempting to attempt to “fix” certain attitudes and behaviors online by placing increased restrictions on users’ speech, but in practice, web platforms have had more success at silencing innocent people than at making online communities healthier." Now, to address the rest of your comment, since I don't just want to focus on the beginning: I think you have to actively moderate what is uploaded Catbox does, and as previously mentioned, often at a much higher rate than other services, and at a comparable rate to many services that have millions, if not billions of dollars in annual profits that could otherwise be spent on further moderation. there has to be swifter and stricter punishment for those that do upload things that are against TOS and/or illegal. The problem isn't necessarily the speed at which people can be reported and punished, but rather that the internet is fundamentally harder to track people on than real life. It's easy for cops to sit around at a spot they know someone will be physically distributing illegal content at in real life, but digitally, even if you can see the feed of all the information passing through the service, a VPN or Tor connection will anonymize your IP address in a manner that most police departments won't be able to track, and most three-letter agencies will simply have a relatively low success rate with. There's no good solution to this problem of identifying perpetrators, which is why platforms often focus on moderation over legal enforcement actions against users so frequently. It accomplishes the goal of preventing and removing the content without having to, for example, require every single user of the internet to scan an ID (and also magically prevent people from just stealing other people's access tokens and impersonating their ID) I do agree, however, that we should probably provide larger amounts of funding, training, and resources, to divisions who's sole goal is to go after online distribution of various illegal content, primarily that which harms children, because it's certainly still an issue of there being too many reports to go through, even if many of them will still lead to dead ends. I hope that explains why making file hosting services liable for user uploaded content probably isn't the best strategy. I hate to see people with good intentions support ideas that sound good in practice, but in the end just cause more untold harms, and I hope you can understand why I believe this to be the case.