Skip to content

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Technology
221 116 1
  • If a human did that it’s still plagiarism.

    Oh I agree it should be, but following the judges ruling, I don't see how it could be. You trained an LLM on textbooks that were purchased, not pirated. And the LLM distributed the responses.

    (Unless you mean the human reworded them, then yeah, we aren't special apparently)

  • So I can't use any of these works because it's plagiarism but AI can?

    Why would it be plagiarism if you use the knowledge you gain from a book?

  • Does buying the book give you license to digitise it?

    Does owning a digital copy of the book give you license to convert it into another format and copy it into a database?

    Definitions of "Ownership" can be very different.

    You can digitize the books you own. You do not need a license for that. And of course you could put that digital format into a database. As databases are explicit exceptions from copyright law. If you want to go to the extreme: delete first copy. Then you have only in the database. However: AIs/LLMs are not based on data bases. But on neural networks. The original data gets lost when "learned".

  • Oh I agree it should be, but following the judges ruling, I don't see how it could be. You trained an LLM on textbooks that were purchased, not pirated. And the LLM distributed the responses.

    (Unless you mean the human reworded them, then yeah, we aren't special apparently)

    Yes, on the second part. Just rearranging or replacing words in a text is not transformative, which is a requirement. There is an argument that the ‘AI’ are capable of doing transformative work, but the tokenizing and weight process is not magic and in my use of multiple LLM’s they do not have an understanding of the material any more then a dictionary understands the material printed on its pages.

    An example was the wine glass problem. Art ‘AI’s were unable to display a wine glass filled to the top. No matter how it was prompted, or what style it aped, it would fail to do so and report back that the glass was full. But it could render a full glass of water. It didn’t understand what a full glass was, not even for the water. How was this possible? Well there was very little art of a full wine glass, because society has an unspoken rule that a full wine glass is the epitome of gluttony, and it is to be savored not drunk. Where as the reference of full glasses of water were abundant. It doesn’t know what full means, just that pictures of full glass of water are tied to phrases full, glass, and water.

  • If what you are saying is true, why were these ‘AI’s” incapable of rendering a full wine glass? It ‘knows’ the concept of a full glass of water, but because of humanities social pressures, a full wine glass being the epitome of gluttony, art work did not depict a full wine glass, no matter how ai prompters demanded, it was unable to link the concepts until it was literally created for it to regurgitate it out. It seems ‘AI’ doesn’t really learn, but regurgitates art out in collages of taken assets, smoothed over at the seams.

    Copilot did it just fine

  • Formatting thing: if you start a line in a new paragraph with four spaces, it assumes that you want to display the text as a code and won't line break.

    This means that the last part of your comment is a long line that people need to scroll to see. If you remove one of the spaces, or you remove the empty line between it and the previous paragraph, it'll look like a normal comment

    With an empty line of space:

    1 space - and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

    2 spaces - and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

    3 spaces - and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

    4 spaces -  and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.
    

    Thanks, I had copy-pasted it from the website 🙂

  • Copilot did it just fine

    1 it’s not full, but closer then it was.

    1. I specifically said that the AI was unable to do it until someone specifically made a reference so that it could start passing the test so it’s a little bit late to prove much.
  • But I thought they admitted to torrenting terabytes of ebooks?

    Facebook (Meta) torrented TBs from Libgen, and their internal chats leaked so we know about that, and IIRC they've been sued. Maybe you're thinking of that case?

  • FTA:

    Anthropic warned against “[t]he prospect of ruinous statutory damages—$150,000 times 5 million books”: that would mean $750 billion.

    So part of their argument is actually that they stole so much that it would be impossible for them/anyone to pay restitution, therefore we should just let them off the hook.

    What is means is they don't own the models. They are the commons of humanity, they are merely temporary custodians. The nightnare ending is the elites keeping the most capable and competent models for themselves as private play things. That must not be allowed to happen under any circumstances. Sue openai, anthropic and the other enclosers, sue them for trying to take their ball and go home. Disposses them and sue the investors for their corrupt influence on research.

  • And thus the singularity was born.

    Yes please a singularity of intellectual property that collapses the idea of ownong ideas. Of making the infinitely freely copyableinto a scarce ressource. What corrupt idiocy this has been. Landlords for ideas and look what garbage it has been producing.

  • Yes, on the second part. Just rearranging or replacing words in a text is not transformative, which is a requirement. There is an argument that the ‘AI’ are capable of doing transformative work, but the tokenizing and weight process is not magic and in my use of multiple LLM’s they do not have an understanding of the material any more then a dictionary understands the material printed on its pages.

    An example was the wine glass problem. Art ‘AI’s were unable to display a wine glass filled to the top. No matter how it was prompted, or what style it aped, it would fail to do so and report back that the glass was full. But it could render a full glass of water. It didn’t understand what a full glass was, not even for the water. How was this possible? Well there was very little art of a full wine glass, because society has an unspoken rule that a full wine glass is the epitome of gluttony, and it is to be savored not drunk. Where as the reference of full glasses of water were abundant. It doesn’t know what full means, just that pictures of full glass of water are tied to phrases full, glass, and water.

    Yeah, we had a fun example a while ago, let me see if I can still find it.

    We would ask to create a photo of a cat with no tail.

    And then tell it there was indeed a tail, and ask it to draw an arrow to point to it.

    It just points to where the tail most commonly is, or was said to be in a picture it was not referencing.

    Edit: granted now, it shows a picture of a cat where you just can't see the tail in the picture.

  • Makes sense. AI can “learn” from and “read” a book in the same way a person can and does, as long as it is acquired legally. AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

    Some people just see “AI” and want everything about it outlawed basically. If you put some information out into the public, you don’t get to decide who does and doesn’t consume and learn from it. If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

    AI can “learn” from and “read” a book in the same way a person can and does,

    If it's in the same way, then why do you need the quotation marks? Even you understand that they're not the same.

    And either way, machine learning is different from human learning in so many ways it's ridiculous to even discuss the topic.

    AI doesn’t reproduce a work that it “learns” from

    That depends on the model and the amount of data it has been trained on. I remember the first public model of ChatGPT producing a sentence that was just one word different from what I found by googling the text (from some scientific article summary, so not a trivial sentence that could line up accidentally). More recently, there was a widely reported-on study of AI-generated poetry where the model was requested to produce a poem in the style of Chaucer, and then produced a letter-for-letter reproduction of the well-known opening of the Canterbury Tales. It hasn't been trained on enough Middle English poetry and thus can't generate any of it, so it defaulted to copying a text that probably occurred dozens of times in its training data.

  • As long as they don't use exactly the same words in the book, yeah, as I understand it.

    How they don't use same words as in the book ? That's not how LLM works. They use exactly same words if the probabilities align. It's proved by this study. https://arxiv.org/abs/2505.12546

  • Copilot did it just fine

    Bro are you a robot yourself? Does that look like a glass full of wine?

  • I am educated on this. When an ai learns, it takes an input through a series of functions and are joined at the output. The set of functions that produce the best output have their functions developed further. Individuals do not process information like that. With poor exploration and biasing, the output of an AI model could look identical to its input. It did not "learn" anymore than a downloaded video ran through a compression algorithm.

    You are obviously not educated on this.

    It did not “learn” anymore than a downloaded video ran through a compression algorithm.
    Just: LoLz.

  • Make an AI that is trained on the books.

    Tell it to tell you a story for one of the books.

    Read the story without paying for it.

    The law says this is ok now, right?

    The LLM is not repeating the same book. The owner of the LLM has the exact same rights to do with what his LLM is reading, as you have to do with what ever YOU are reading.

    As long as it is not a verbatim recitation, it is completely okay.

    According to story telling theory: there are only roughly 15 different story types anyway.

  • You are obviously not educated on this.

    It did not “learn” anymore than a downloaded video ran through a compression algorithm.
    Just: LoLz.

    I am not sure what your contention, or gotcha, is with the comment above but they are quite correct. And additionally chose quite an apt example with video compression since in most ways current 'AI' effectively functions as a compression algorithm, just for our language corpora instead of video.

  • Lawsuits are multifaceted. This statement isn't a a defense or an argument for innocence, it's just what it says - an assertion that the proposed damages are unreasonably high. If the court agrees, the plaintiff can always propose a lower damage claim that the court thinks is reasonable.

    You’re right, each of the 5 million books’ authors should agree to less payment for their work, to make the poor criminals feel better.

    If I steal $100 from a thousand people and spend it all on hookers and blow, do I get out of paying that back because I don’t have the funds? Should the victims agree to get $20 back instead because that’s more within my budget?

  • Does buying the book give you license to digitise it?

    Does owning a digital copy of the book give you license to convert it into another format and copy it into a database?

    Definitions of "Ownership" can be very different.

    It seems like a lot of people misunderstand copyright so let's be clear: the answer is yes. You can absolutely digitize your books. You can rip your movies and store them on a home server and run them through compression algorithms.

    Copyright exists to prevent others from redistributing your work so as long as you're doing all of that for personal use, the copyright owner has no say over what you do with it.

    You even have some degree of latitude to create and distribute transformative works with a violation only occurring when you distribute something pretty damn close to a copy of the original. Some perfectly legal examples: create a word cloud of a book, analyze the tone of news article to help you trade stocks, produce an image containing the most prominent color in every frame of a movie, or create a search index of the words found on all websites on the internet.

    You can absolutely do the same kinds of things an AI does with a work as a human.

  • That's legal just don't look at them or enjoy them.

    Yeah, I don't think that would fly.

    "Your honour, I was just hoarding that terabyte of Hollywood films, I haven't actually watched them."

  • 69 Stimmen
    17 Beiträge
    7 Aufrufe
    H
    Set up arrs, you basically set it and forget it.
  • 179 Stimmen
    9 Beiträge
    12 Aufrufe
    R
    They've probably just crunched the numbers and determined the cost of a recall in Canada was greater than the cost of law suits when your house does burn down
  • 40 Stimmen
    10 Beiträge
    9 Aufrufe
    T
    Clearly the author doesn't understand how capitalism works. If Apple can pick you up by the neck, turn you upside down, and shake whatever extra money it can from you then it absolutely will do so. The problem is that one indie developer doesn't have any power over Apple... so they can go fuck themselves. The developer is granted the opportunity to grovel at the feet of their betters (richers) and pray that they are allowed to keep enough of their own crop to survive the winter. If they don't survive... then some other dev will probably jump at the chance to take part in the "free market" and demonstrate their worth.
  • Catbox.moe got screwed 😿

    Technology technology
    40
    55 Stimmen
    40 Beiträge
    24 Aufrufe
    archrecord@lemm.eeA
    I'll gladly give you a reason. I'm actually happy to articulate my stance on this, considering how much I tend to care about digital rights. Services that host files should not be held responsible for what users upload, unless: The service explicitly caters to illegal content by definition or practice (i.e. the if the website is literally titled uploadyourcsamhere[.]com then it's safe to assume they deliberately want to host illegal content) The service has a very easy mechanism to remove illegal content, either when asked, or through simple monitoring systems, but chooses not to do so (catbox does this, and quite quickly too) Because holding services responsible creates a whole host of negative effects. Here's some examples: Someone starts a CDN and some users upload CSAM. The creator of the CDN goes to jail now. Nobody ever wants to create a CDN because of the legal risk, and thus the only providers of CDNs become shady, expensive, anonymously-run services with no compliance mechanisms. You run a site that hosts images, and someone decides they want to harm you. They upload CSAM, then report the site to law enforcement. You go to jail. Anybody in the future who wants to run an image sharing site must now self-censor to try and not upset any human being that could be willing to harm them via their site. A social media site is hosting the posts and content of users. In order to be compliant and not go to jail, they must engage in extremely strict filtering, otherwise even one mistake could land them in jail. All users of the site are prohibited from posting any NSFW or even suggestive content, (including newsworthy media, such as an image of bodies in a warzone) and any violation leads to an instant ban, because any of those things could lead to a chance of actually illegal content being attached. This isn't just my opinion either. Digital rights organizations such as the Electronic Frontier Foundation have talked at length about similar policies before. To quote them: "When social media platforms adopt heavy-handed moderation policies, the unintended consequences can be hard to predict. For example, Twitter’s policies on sexual material have resulted in posts on sexual health and condoms being taken down. YouTube’s bans on violent content have resulted in journalism on the Syrian war being pulled from the site. It can be tempting to attempt to “fix” certain attitudes and behaviors online by placing increased restrictions on users’ speech, but in practice, web platforms have had more success at silencing innocent people than at making online communities healthier." Now, to address the rest of your comment, since I don't just want to focus on the beginning: I think you have to actively moderate what is uploaded Catbox does, and as previously mentioned, often at a much higher rate than other services, and at a comparable rate to many services that have millions, if not billions of dollars in annual profits that could otherwise be spent on further moderation. there has to be swifter and stricter punishment for those that do upload things that are against TOS and/or illegal. The problem isn't necessarily the speed at which people can be reported and punished, but rather that the internet is fundamentally harder to track people on than real life. It's easy for cops to sit around at a spot they know someone will be physically distributing illegal content at in real life, but digitally, even if you can see the feed of all the information passing through the service, a VPN or Tor connection will anonymize your IP address in a manner that most police departments won't be able to track, and most three-letter agencies will simply have a relatively low success rate with. There's no good solution to this problem of identifying perpetrators, which is why platforms often focus on moderation over legal enforcement actions against users so frequently. It accomplishes the goal of preventing and removing the content without having to, for example, require every single user of the internet to scan an ID (and also magically prevent people from just stealing other people's access tokens and impersonating their ID) I do agree, however, that we should probably provide larger amounts of funding, training, and resources, to divisions who's sole goal is to go after online distribution of various illegal content, primarily that which harms children, because it's certainly still an issue of there being too many reports to go through, even if many of them will still lead to dead ends. I hope that explains why making file hosting services liable for user uploaded content probably isn't the best strategy. I hate to see people with good intentions support ideas that sound good in practice, but in the end just cause more untold harms, and I hope you can understand why I believe this to be the case.
  • Generative AI's most prominent skeptic doubles down

    Technology technology
    14
    1
    43 Stimmen
    14 Beiträge
    8 Aufrufe
    Z
    I don't think so, and I believe not even the current technology used for neural network simulations will bring us to AGI, yet alone LLMs.
  • 13 Stimmen
    22 Beiträge
    14 Aufrufe
    T
    You might enjoy this blog post someone linked in another thread earlier today https://www.wheresyoured.at/the-era-of-the-business-idiot/
  • The technology to end traffic deaths exists. Why aren’t we using it?

    Technology technology
    36
    43 Stimmen
    36 Beiträge
    14 Aufrufe
    M
    You’re seriously attempting to argue with me about whether or not transportation existed before cars?
  • AI will replace routine — freeing people for creativity.

    Technology technology
    14
    2
    42 Stimmen
    14 Beiträge
    8 Aufrufe
    G
    So you are against having machines do the work of blue collar workers? We should all be out in the fields with plows instead of using a tractor and assembling everything by hand in factories?