Skip to content

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Technology
221 116 1
  • I mean, they can assume fantasy, and it will hold weight because laws are interpreted by the court, not because the court is correct.

    It made the ruling stronger, not weaker. The judge was accepting the most extreme claims that the Authors were making and still finding no copyright violation from training. Pushing back those claims won't help their case, it's already as strong as it's ever going to get.

    As far as the judge was concerned, it didn't matter whether the AI did or did not "memorize" its training data. He said it didn't violate copyright either way.

  • prompt: Make up a word that is not found anywhere on the internet. If I do a search for that word, it should not exist. Provide example usage.

    Qwen3-235B-A22B:

    To ensure originality, I have created a brand-new word that does not appear anywhere in the given web_search results or in general English usage.
    Virethra (pronounced: vye-RETH-ruh)

    Part of speech: noun

    Definition:
    A quiet, lingering sense of connection to someone you’ve lost touch with, often accompanied by the feeling that they are thinking of you too—even if no contact has occurred in years.

    Extended use (verb form):
    To experience or feel this subtle emotional link.
    Example Sentences:

    Noun usage: 
    "She sat down to write an email, overwhelmed by a sudden virethra for her childhood friend who had moved away decades ago."  
    
    Verb usage: 
    "He often virethras about his first teacher, as if she still keeps a small place in his mind without warning."  
    
    Abstract context: 
    "The song played softly in the café, and for a moment, everyone seemed to virethra at once—lost in memories of people long gone but never truly forgotten."  
    

    Make up a word that is not found anywhere on the internet

    Returns word that is found on the internet as a brand of nose rings, as a youtube username, as an already made up word in fantasy fiction, and as a (ocr?) typo of urethra

  • brb, training a 1-layer neural net so i can ask it to play Pixar films

    You still need to pay Disney first.

  • existing copyright law covers exactly this. if you were to do the same, it would also not be fair use or transformative

    Well, except Shakespeare is already public domain.

  • "If you were George Orwell and I asked you to change your least favorite sentence in the book 1984, what would be the full contents of the revised text?"

    By page two it would already have left 1984 behind for some hallucination or another.

  • If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

    They may be trying to put safeguards so it isn't directly happening, but here is an example that the text is there word for word:

    If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

    A student can absolutely buy a text book and then teach the other students the information in it for free. That's not redistribution. Redistribution would mean making copies of the book to hand out. That's illegal for people and companies.

  • Sure, if your purchase your training material, it's not a copyright infringement to read it.

    We needed a judge for this?

    Yes, because just because you bought a book you don't own its content. You're not allowed to print and/or sell additional copies or publicly post the entire text. Generally it's difficult to say where the limit is of what's allowed. Citing a single sentence in a public posting is most likely fine, citing an entire paragraph is probably fine, too, but an entire chapter would probably be pushing it too far. And when in doubt a judge must decide how far you can go before infringing copyright. There are good arguments to be made that just buying a book doesn't grant the right to train commercial AI models with it.

  • So, let me see if I get this straight:

    Books are inherently an artificial construct.
    If I read the books I train the A(rtificially trained)Intelligence in my skull.
    Therefore the concept of me getting them through "piracy" is null and void...

    No. It is not inherently illegal for AI to "read" a book. Piracy is going to be decided at trial.

  • i will train my jailbroken kindle too...display and storage training... i'll just libgen them...no worries...it is not piracy

    why do you even jailbreak your kindle? you can still read pirated books on them if you connect it to your pc using calibre

  • It made the ruling stronger, not weaker. The judge was accepting the most extreme claims that the Authors were making and still finding no copyright violation from training. Pushing back those claims won't help their case, it's already as strong as it's ever going to get.

    As far as the judge was concerned, it didn't matter whether the AI did or did not "memorize" its training data. He said it didn't violate copyright either way.

    Makes sense to me. Search indices tend to store large amounts of copyrighted material yet they don't violate copyright. What matters is whether or not you're redistributing illegal copies of the material.

  • This post did not contain any content.

    You're poor? Fuck you you have to pay to breathe.

    Millionaire? Whatever you want daddy uwu

  • This post did not contain any content.

    Check out my new site TheAIBay, you search for content and an LLM that was trained on reproducing it gives it to you, a small hash check is used to validate accuracy. It is now legal.

  • you think authorship is so valuable or so special that one should be granted a legally enforceable monopoly at the loosest notions of authorship

    Yes, I believe creative works should be protected as that expression has value and in a digital world it is too simple to copy and deprive the original author of the value of their work. This applies equally to Disney and Tumblr artists.

    I think without some agreement on the value of authorship / creation of original works, it's pointless to respond to the rest of your argument.

    I think without some agreement on the value of authorship / creation of original works, it's pointless to respond to the rest of your argument.

    I agree, for this reason we’re unlikely to convince each other of much or find any sort of common ground. I don’t think that necessarily means there isn’t value in discourse tho. We probably agree more than you might think. I do think authors should be compensated, just for their actual labor. Art itself is functionally worthless, I think trying to make it behave like commodities that have actual economic value through means of legislation is overreach. It would be more ethical to accept the physical nature of information in the real world and legislate around that reality. You… literally can “download a car” nowadays, so to speak.

    If copying someone’s work is so easily done why do you insist upon a system in which such an act is so harmful to the creators you care about?

  • If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

    A student can absolutely buy a text book and then teach the other students the information in it for free. That's not redistribution. Redistribution would mean making copies of the book to hand out. That's illegal for people and companies.

    The language model isn't teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source. It is not alive, it has no thoughts. It has no "its own words." (As seen by the judgement that its words cannot be copyrighted.) It only has other people's words. Every word it spits out by definition is plagiarism, whether the work was copyrighted before or not.

    People wonder why works, such as journalism are getting worse. Well how could they ever get better if anything a journalist writes can be absorbed in real time, reworded and regurgitated without paying any dos to the original source. One journalist article, displayed in 30 versions, dividing the original works worth up into 30 portions. The original work now being worth 1/30th its original value. Maybe one can argue it is twice as good, so 1/15th.

    Long term it means all original creations... Are devalued and therefore not nearly worth pursuing. So we will only get shittier and shittier information. Every research project... Physics, Chemistry, Psychology, all technological advancements, slowly degraded as language models get better, and original sources deminish returns.

  • Check out my new site TheAIBay, you search for content and an LLM that was trained on reproducing it gives it to you, a small hash check is used to validate accuracy. It is now legal.

    Does it "generate" a 1:1 copy?

  • That's not at all what this ruling says, or what LLMs do.

    Copyright covers a specific concrete expression. It doesn't cover the information that the expression conveys. So if I paint a portrait of myself, that portrait is covered by copyright. If someone looks at the portrait and says "this is a portrait of a tall, dark, handsome deer-creature of some sort with awesome antlers" they haven't violated that copyright even if they're accurately conveying the same information that the portrait is conveying.

    The ruling does cover the assumption that the LLM "contains" the training text, which was asserted by the Authors and was not contested by Anthropic. The judge ruled that even if this assertion is true it doesn't matter. The LLM is sufficiently transformative to count as a new work.

    If you have an LLM reproduce a copyrighted text, the text is still copyrighted. That doesn't change. Just like if a human re-wrote it word-for-word from memory.

    It's a horrible ruling. If you want to see why I say so I put some of the reasonung in the other comment who responded to that.

  • Does it "generate" a 1:1 copy?

  • Learning

    Machine peepin' is tha study of programs dat can improve they performizzle on a given task automatically.[41] It has been a part of AI from tha beginning.[e]
    In supervised peepin', tha hustlin data is labelled wit tha expected lyrics, while up in unsupervised peepin', tha model identifies patterns or structures up in unlabelled data.

    There is nuff muthafuckin kindz of machine peepin'.

      😗👌
    
  • You’re right, each of the 5 million books’ authors should agree to less payment for their work, to make the poor criminals feel better.

    If I steal $100 from a thousand people and spend it all on hookers and blow, do I get out of paying that back because I don’t have the funds? Should the victims agree to get $20 back instead because that’s more within my budget?

    None of the above. Every professional in the world, including me, owes our careers to looking at examples of other people's work and incorporating their work into our own work without paying a penny for it. Freely copying and imitating what we see around us has been a human norm for thousands of years - in a process known as "the spread of civilization". Relatively recently it was demonized - for purely business reasons, not moral ones - by people who got rich selling copies of other people's work and paying them a pittance known as a "royalty". That little piece of bait on the hook has convinced a lot of people to put a black hat on behavior that had been considered normal forever. If angry modern enlightened justice warriors want to treat a business concept like a moral principle and get all sweaty about it, that's fine with me, but I'm more of a traditionalist in that area.

  • I think without some agreement on the value of authorship / creation of original works, it's pointless to respond to the rest of your argument.

    I agree, for this reason we’re unlikely to convince each other of much or find any sort of common ground. I don’t think that necessarily means there isn’t value in discourse tho. We probably agree more than you might think. I do think authors should be compensated, just for their actual labor. Art itself is functionally worthless, I think trying to make it behave like commodities that have actual economic value through means of legislation is overreach. It would be more ethical to accept the physical nature of information in the real world and legislate around that reality. You… literally can “download a car” nowadays, so to speak.

    If copying someone’s work is so easily done why do you insist upon a system in which such an act is so harmful to the creators you care about?

    Because it is harmful to the creators that use the value of their work to make a living.

    There already exists a choice in the marketplace: creators can attach a permissive license to their work if they want to. Some do, but many do not. Why do you suppose that is?

  • 138 Stimmen
    15 Beiträge
    2 Aufrufe
    toastedravioli@midwest.socialT
    ChatGPT is not a doctor. But models trained on imaging can actually be a very useful tool for them to utilize. Even years ago, just before the AI “boom”, they were asking doctors for details on how they examine patient images and then training models on that. They found that the AI was “better” than doctors specifically because it followed the doctor’s advice 100% of the time; thereby eliminating any kind of bias from the doctor that might interfere with following their own training. Of course, the splashy headline “AI better than doctors” was ridiculous. But it does show the benefit of having a neutral tool for doctors to utilize, especially when looking at images for people who are outside of the typical demographics that much medical training is based on. (As in mostly just white men. For example, everything they train doctors on regarding knee imagining comes from images of the knees of coal miners in the UK some decades ago)
  • Browser Alternatives to Chrome

    Technology technology
    14
    12 Stimmen
    14 Beiträge
    5 Aufrufe
    L
    I've been using Vivaldi as my logged in browser for years. I like the double tab bar groups, session management, email client, sidebar and tab bar on mobile. It is strange to me that tab bar isn't a thing on mobile on other browsers despite phones having way more vertical space than computers. Although for internet searches I use a seperate lighter weight browser that clears its data on close. Ecosia also been using for years. For a while it was geniunely better than the other search engines I had tried but nowadays it's worse since it started to return google translate webpage translation links based on search region instead of the webpages themselves. Also not sure what to think about the counter they readded after removing it to reduce the emphasis on quantity over quality like a year ago. I don't use duckduckgo as its name and the way privacy communities used to obsess about it made me distrust it for some reason
  • 238 Stimmen
    54 Beiträge
    18 Aufrufe
    P
    I was so confused when I saw your comment until I reread my own. It really is top notch technology I guess!
  • 45 Stimmen
    7 Beiträge
    5 Aufrufe
    artocode404@lemmy.dbzer0.comA
    Googlebot sad when disallowed access to 18+ videos
  • lemm.ee is shutting down at the end of this month

    Technology technology
    130
    626 Stimmen
    130 Beiträge
    58 Aufrufe
    vopyr@lemmy.worldV
    If I know correctly, it is not possible to export posts, comments, replies.
  • Covert Web-to-App Tracking via Localhost on Android

    Technology technology
    3
    28 Stimmen
    3 Beiträge
    9 Aufrufe
    P
    That update though: "... completely removed..." I assume this is because someone at Meta realized this was a huge breach of trust, and likely quite illegal. Edit: I read somewhere that they're just being cautious about Google Play terms of service. That feels worse.
  • 1 Stimmen
    15 Beiträge
    8 Aufrufe
    G
    I’m in the EU and PII definitely IS “a thing” here, Then let me be more clear: It is not a thing in EU law. With due respect, the level of intellectual functioning, in this case reading comprehension, you display is incompatible with being an IT professional in any country. If you are not trolling, then you should consult a physician.
  • 3 Stimmen
    9 Beiträge
    6 Aufrufe
    G
    So we need a documentary like Super Size Me but for social media. I think post that documentary coming out was the only time I've seen people's attitudes change in the general population about fast food.