Skip to content

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Technology
213 113 0
  • If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

    Well, it would be interesting if this case would be used as precedence in a case invonving a single student that do the same thing. But you are right

    This was my understanding also, and why I think the judge is bad at their job.

  • AI can “learn” from and “read” a book in the same way a person can and does

    This statement is the basis for your argument and it is simply not correct.

    Training LLMs and similar AI models is much closer to a sophisticated lossy compression algorithm than it is to human learning. The processes are not at all similar given our current understanding of human learning.

    AI doesn’t reproduce a work that it “learns” from, so why would it be illegal?

    The current Disney lawsuit against Midjourney is illustrative - literally, it includes numerous side-by-side comparisons - of how AI models are capable of recreating iconic copyrighted work that is indistinguishable from the original.

    If a machine can replicate your writing style because it could identify certain patterns, words, sentence structure, etc then as long as it’s not pretending to create things attributed to you, there’s no issue.

    An AI doesn't create works on its own. A human instructs AI to do so. Attribution is also irrelevant. If a human uses AI to recreate the exact tone, structure and other nuances of say, some best selling author, they harm the marketability of the original works which fails fair use tests (at least in the US).

    Even if we accept all your market liberal premise without question... in your own rhetorical framework the Disney lawsuit should be ruled against Disney.

    If a human uses AI to recreate the exact tone, structure and other nuances of say, some best selling author, they harm the marketability of the original works which fails fair use tests (at least in the US).

    Says who? In a free market why is the competition from similar products and brands such a threat as to be outlawed? Think reasonably about what you are advocating... you think authorship is so valuable or so special that one should be granted a legally enforceable monopoly at the loosest notions of authorship. This is the definition of a slippery-slope, and yet, it is the status quo of the society we live in.

    On it "harming marketability of the original works," frankly, that's a fiction and anyone advocating such ideas should just fucking weep about it instead of enforce overreaching laws on the rest of us. If you can't sell your art because a machine made "too good a copy" of your art, it wasn't good art in the first place and that is not the fault of the machine. Even big pharma doesn't get to outright ban generic medications (even tho they certainly tried)... it is patently fucking absurd to decry artist's lack of a state-enforced monopoly on their work. Why do you think we should extend such a radical policy towards... checks notes... tumblr artists and other commission based creators? It's not good when big companies do it for themselves through lobbying, it wouldn't be good to do it for "the little guy," either. The real artists working in industry don't want to change the law this way because they know it doesn't work in their favor. Disney's lawsuit is in the interest of Disney and big capital, not artists themselves, despite what these large conglomerates that trade in IPs and dreams might try to convince the art world writ large of.

  • If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

    They may be trying to put safeguards so it isn't directly happening, but here is an example that the text is there word for word:

    Not at all true. AI doesn’t just reproduce content it was trained on on demand.

  • My interpretation was that AI companies can train on material they are licensed to use, but the courts have deemed that Anthropic pirated this material as they were not licensed to use it.

    In other words, if Anthropic bought the physical or digital books, it would be fine so long as their AI couldn't spit it out verbatim, but they didn't even do that, i.e. the AI crawler pirated the book.

    Does buying the book give you license to digitise it?

    Does owning a digital copy of the book give you license to convert it into another format and copy it into a database?

    Definitions of "Ownership" can be very different.

  • This was my understanding also, and why I think the judge is bad at their job.

    I suppose someone could develop an LLM that digests textbooks, and rewords the text and spits it back out. Then distribute it for free page for page. You can't copy right the math problems I don't think.. so if the text wording is what gives it credence, that would have been changed.

  • I joined lemmy specifically to avoid this reddit mindset of jumping to conclusions after reading a headline

    Guess some things never change...

    Well to be honest lemmy is less prone to knee-jerk reactionary discussion but on a handful of topics it is virtually guaranteed to happen no matter what, even here. For example, this entire site, besides a handful of communities, is vigorously anti-AI; and in the words of u/jsomae@lemmy.ml elsewhere in this comment chain:

    "It seems the subject of AI causes lemmites to lose all their braincells."

    I think there is definitely an interesting take on the sociology of the digital age in here somewhere but it's too early in the morning to be tapping something like that out lol

  • You're getting douchevoted because on lemmy any AI-related comment that isn't negative enough about AI is the Devil's Work.

    Some communities on this site speak about machine learning exactly how I see grungy Europeans from pre-18th century manuscripts speaking about witches, Satan, and evil... as if it is some pervasive, black-magic miasma.

    As someone who is in the field of machine learning academically/professionally it's honestly kind of shocking and has largely informed my opinion of society at large as an adult. No one puts any effort into learning if they see the letters "A" and "I" in all caps, next to each other. Immediately turn their brain off and start regurgitating points and responding reflexively, on Lemmy or otherwise. People talk about it so confidently while being so frustratingly unaware of their own ignorance on the matter, which, for lack of a better comparison... reminds me a lot of how historically and in fiction human beings have treated literal magic.

    That's my main issue with the entire swath of "pro vs anti AI" discourse... all these people treating something that, to me, is simple & daily reality as something entirely different than my own personal notion of it.

  • You can “use” them to learn from, just like “AI” can.

    What exactly do you think AI does when it “learns” from a book, for example? Do you think it will just spit out the entire book if you ask it to?

    I am educated on this. When an ai learns, it takes an input through a series of functions and are joined at the output. The set of functions that produce the best output have their functions developed further. Individuals do not process information like that. With poor exploration and biasing, the output of an AI model could look identical to its input. It did not "learn" anymore than a downloaded video ran through a compression algorithm.

  • LLMs don’t learn, and they’re not people. Applying the same logic doesn’t make much sense.

    The judge isn't saying that they learn or that they're people. He's saying that training falls into the same legal classification as learning.

  • Your very first statement calling my basis for my argument incorrect is incorrect lol.

    LLMs “learn” things from the content they consume. They don’t just take the content in wholesale and keep it there to regurgitate on command.

    On your last part, unless someone uses AI to recreate the tone etc of a best selling author *and then markets their book/writing as being from said best selling author, and doesn’t use trademarked characters etc, there’s no issue. You can’t copyright a style of writing.

    If what you are saying is true, why were these ‘AI’s” incapable of rendering a full wine glass? It ‘knows’ the concept of a full glass of water, but because of humanities social pressures, a full wine glass being the epitome of gluttony, art work did not depict a full wine glass, no matter how ai prompters demanded, it was unable to link the concepts until it was literally created for it to regurgitate it out. It seems ‘AI’ doesn’t really learn, but regurgitates art out in collages of taken assets, smoothed over at the seams.

  • I suppose someone could develop an LLM that digests textbooks, and rewords the text and spits it back out. Then distribute it for free page for page. You can't copy right the math problems I don't think.. so if the text wording is what gives it credence, that would have been changed.

    If a human did that it’s still plagiarism.

  • What a bad judge.

    Why ? Basically he simply stated that you can use whatever material you want to train your model as long as you ask the permission to use it (and presumably pay for it) to the author (or copytight holder)

    "Fair use" is the exact opposite of what you're saying here. It says that you don't need to ask for any permission. The judge ruled that obtaining illegitimate copies was unlawful but use without the creators consent is perfectly fine.

  • Not at all true. AI doesn’t just reproduce content it was trained on on demand.

    It can, the only thing stopping it is if it is specifically told not to, and this consideration is successfully checked for. It is completely capable of plagiarizing otherwise.

  • Gist:

    What’s new: The Northern District of California has granted a summary judgment for Anthropic that the training use of the copyrighted books and the print-to-digital format change were both “fair use” (full order below box). However, the court also found that the pirated library copies that Anthropic collected could not be deemed as training copies, and therefore, the use of this material was not “fair”. The court also announced that it will have a trial on the pirated copies and any resulting damages, adding:

    “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”

    Formatting thing: if you start a line in a new paragraph with four spaces, it assumes that you want to display the text as a code and won't line break.

    This means that the last part of your comment is a long line that people need to scroll to see. If you remove one of the spaces, or you remove the empty line between it and the previous paragraph, it'll look like a normal comment

    With an empty line of space:

    1 space - and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

    2 spaces - and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

    3 spaces - and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

    4 spaces -  and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.
    
  • If a human did that it’s still plagiarism.

    Oh I agree it should be, but following the judges ruling, I don't see how it could be. You trained an LLM on textbooks that were purchased, not pirated. And the LLM distributed the responses.

    (Unless you mean the human reworded them, then yeah, we aren't special apparently)

  • So I can't use any of these works because it's plagiarism but AI can?

    Why would it be plagiarism if you use the knowledge you gain from a book?

  • Does buying the book give you license to digitise it?

    Does owning a digital copy of the book give you license to convert it into another format and copy it into a database?

    Definitions of "Ownership" can be very different.

    You can digitize the books you own. You do not need a license for that. And of course you could put that digital format into a database. As databases are explicit exceptions from copyright law. If you want to go to the extreme: delete first copy. Then you have only in the database. However: AIs/LLMs are not based on data bases. But on neural networks. The original data gets lost when "learned".

  • Oh I agree it should be, but following the judges ruling, I don't see how it could be. You trained an LLM on textbooks that were purchased, not pirated. And the LLM distributed the responses.

    (Unless you mean the human reworded them, then yeah, we aren't special apparently)

    Yes, on the second part. Just rearranging or replacing words in a text is not transformative, which is a requirement. There is an argument that the ‘AI’ are capable of doing transformative work, but the tokenizing and weight process is not magic and in my use of multiple LLM’s they do not have an understanding of the material any more then a dictionary understands the material printed on its pages.

    An example was the wine glass problem. Art ‘AI’s were unable to display a wine glass filled to the top. No matter how it was prompted, or what style it aped, it would fail to do so and report back that the glass was full. But it could render a full glass of water. It didn’t understand what a full glass was, not even for the water. How was this possible? Well there was very little art of a full wine glass, because society has an unspoken rule that a full wine glass is the epitome of gluttony, and it is to be savored not drunk. Where as the reference of full glasses of water were abundant. It doesn’t know what full means, just that pictures of full glass of water are tied to phrases full, glass, and water.

  • If what you are saying is true, why were these ‘AI’s” incapable of rendering a full wine glass? It ‘knows’ the concept of a full glass of water, but because of humanities social pressures, a full wine glass being the epitome of gluttony, art work did not depict a full wine glass, no matter how ai prompters demanded, it was unable to link the concepts until it was literally created for it to regurgitate it out. It seems ‘AI’ doesn’t really learn, but regurgitates art out in collages of taken assets, smoothed over at the seams.

    Copilot did it just fine

  • Formatting thing: if you start a line in a new paragraph with four spaces, it assumes that you want to display the text as a code and won't line break.

    This means that the last part of your comment is a long line that people need to scroll to see. If you remove one of the spaces, or you remove the empty line between it and the previous paragraph, it'll look like a normal comment

    With an empty line of space:

    1 space - and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

    2 spaces - and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

    3 spaces - and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.

    4 spaces -  and a little bit of writing just to see how the text will wrap. I don't really have anything that I want to put here, but I need to put enough here to make it long enough to wrap around. This is likely enough.
    

    Thanks, I had copy-pasted it from the website 🙂

  • 495 Stimmen
    153 Beiträge
    17 Aufrufe
    B
    It's not how it works, don't worry. They're just strawmanning.
  • Honda successfully launched and landed its own reusable rocket

    Technology technology
    170
    1
    1k Stimmen
    170 Beiträge
    25 Aufrufe
    gerryflap@feddit.nlG
    Call me an optimist, but I still hold the hope that we can one day do better as humanity than we do now. Humanity has become a "better" species throughout its existence overall. Even a hundred years ago we were much more horrible and brutal than we are now. The current trend is not great, with climate change and far-right grifters taking control. But I hold hope that in the end this is but a blip on the radar. Horrible for us now, but in the grand scheme of things not something that will end humanity. It might in the worst case set us back a few hundred years.
  • 61 Stimmen
    17 Beiträge
    7 Aufrufe
    anzo@programming.devA
    I’ll probably never trust anything they’ve touched until I’ve taken it apart and put it back together again. Me too. But the vast majority of users need guardrails, and have a different threat model. Even those that also care about privacy, if they just want a solution that comes by default, this adtech 'fake' or 'superficial' solution does provide something. And anything is more than nothing.
  • 68 Stimmen
    4 Beiträge
    7 Aufrufe
    jimmydoreisalefty@lemmy.worldJ
    Damn, I heard this mentioned somewhere as well! I don't remember where, though... The CIA is also involved with the cartels in Mexico as well as certain groups in the Middle East. They like to bring "democracy" to many countries that won't become a pawn of the Western regime.
  • 5 Stimmen
    6 Beiträge
    4 Aufrufe
    B
    Oh sorry, my mind must have been a bit foggy when I read that. We agree 100%
  • 5 Stimmen
    10 Beiträge
    6 Aufrufe
    S
    You could look into automatic local caching for diles you're planning to seed, and stick that on an SSD. That way you don't hammer the HDDs in the NAS and still get the good feels of seeding. Then automatically delete files once they get to a certain seed rate or something and you're golden. How aggressive you go with this depends on your actual use case. Are you actually editing raw footage over the network while multiple other clients are streaming other stuff? Or are you just interested in having it be capable? What's the budget? But that sounds complicated. I'd personally rather just DIY it, that way you can put an SSD in there for cache and you get most of the benefits with a lot less cost, and you should be able to respond to issues with minimal changes (i.e. add more RAM or another caching drive).
  • 56 Stimmen
    4 Beiträge
    8 Aufrufe
    cupcakezealot@lemmy.blahaj.zoneC
    !upliftingnews@lemmy.world
  • 2 Stimmen
    8 Beiträge
    10 Aufrufe
    F
    IMO stuff like that is why a good trainer is important. IMO it's stronger evidence that proper user-centered design should be done and a usable and intuitive UX and set of APIs developed. But because the buyer of this heap of shit is some C-level, there is no incentive to actually make it usable for the unfortunate peons who are forced to interact with it. See also SFDC and every ERP solution in existence.