Skip to content

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Technology
254 123 1.9k
  • It made the ruling stronger, not weaker. The judge was accepting the most extreme claims that the Authors were making and still finding no copyright violation from training. Pushing back those claims won't help their case, it's already as strong as it's ever going to get.

    As far as the judge was concerned, it didn't matter whether the AI did or did not "memorize" its training data. He said it didn't violate copyright either way.

    Makes sense to me. Search indices tend to store large amounts of copyrighted material yet they don't violate copyright. What matters is whether or not you're redistributing illegal copies of the material.

  • This post did not contain any content.

    You're poor? Fuck you you have to pay to breathe.

    Millionaire? Whatever you want daddy uwu

  • This post did not contain any content.

    Check out my new site TheAIBay, you search for content and an LLM that was trained on reproducing it gives it to you, a small hash check is used to validate accuracy. It is now legal.

  • you think authorship is so valuable or so special that one should be granted a legally enforceable monopoly at the loosest notions of authorship

    Yes, I believe creative works should be protected as that expression has value and in a digital world it is too simple to copy and deprive the original author of the value of their work. This applies equally to Disney and Tumblr artists.

    I think without some agreement on the value of authorship / creation of original works, it's pointless to respond to the rest of your argument.

    I think without some agreement on the value of authorship / creation of original works, it's pointless to respond to the rest of your argument.

    I agree, for this reason we’re unlikely to convince each other of much or find any sort of common ground. I don’t think that necessarily means there isn’t value in discourse tho. We probably agree more than you might think. I do think authors should be compensated, just for their actual labor. Art itself is functionally worthless, I think trying to make it behave like commodities that have actual economic value through means of legislation is overreach. It would be more ethical to accept the physical nature of information in the real world and legislate around that reality. You… literally can “download a car” nowadays, so to speak.

    If copying someone’s work is so easily done why do you insist upon a system in which such an act is so harmful to the creators you care about?

  • If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

    A student can absolutely buy a text book and then teach the other students the information in it for free. That's not redistribution. Redistribution would mean making copies of the book to hand out. That's illegal for people and companies.

    The language model isn't teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source. It is not alive, it has no thoughts. It has no "its own words." (As seen by the judgement that its words cannot be copyrighted.) It only has other people's words. Every word it spits out by definition is plagiarism, whether the work was copyrighted before or not.

    People wonder why works, such as journalism are getting worse. Well how could they ever get better if anything a journalist writes can be absorbed in real time, reworded and regurgitated without paying any dos to the original source. One journalist article, displayed in 30 versions, dividing the original works worth up into 30 portions. The original work now being worth 1/30th its original value. Maybe one can argue it is twice as good, so 1/15th.

    Long term it means all original creations... Are devalued and therefore not nearly worth pursuing. So we will only get shittier and shittier information. Every research project... Physics, Chemistry, Psychology, all technological advancements, slowly degraded as language models get better, and original sources deminish returns.

  • Check out my new site TheAIBay, you search for content and an LLM that was trained on reproducing it gives it to you, a small hash check is used to validate accuracy. It is now legal.

    Does it "generate" a 1:1 copy?

  • That's not at all what this ruling says, or what LLMs do.

    Copyright covers a specific concrete expression. It doesn't cover the information that the expression conveys. So if I paint a portrait of myself, that portrait is covered by copyright. If someone looks at the portrait and says "this is a portrait of a tall, dark, handsome deer-creature of some sort with awesome antlers" they haven't violated that copyright even if they're accurately conveying the same information that the portrait is conveying.

    The ruling does cover the assumption that the LLM "contains" the training text, which was asserted by the Authors and was not contested by Anthropic. The judge ruled that even if this assertion is true it doesn't matter. The LLM is sufficiently transformative to count as a new work.

    If you have an LLM reproduce a copyrighted text, the text is still copyrighted. That doesn't change. Just like if a human re-wrote it word-for-word from memory.

    It's a horrible ruling. If you want to see why I say so I put some of the reasonung in the other comment who responded to that.

  • Does it "generate" a 1:1 copy?

  • Learning

    Machine peepin' is tha study of programs dat can improve they performizzle on a given task automatically.[41] It has been a part of AI from tha beginning.[e]
    In supervised peepin', tha hustlin data is labelled wit tha expected lyrics, while up in unsupervised peepin', tha model identifies patterns or structures up in unlabelled data.

    There is nuff muthafuckin kindz of machine peepin'.

      😗👌
    
  • You’re right, each of the 5 million books’ authors should agree to less payment for their work, to make the poor criminals feel better.

    If I steal $100 from a thousand people and spend it all on hookers and blow, do I get out of paying that back because I don’t have the funds? Should the victims agree to get $20 back instead because that’s more within my budget?

    None of the above. Every professional in the world, including me, owes our careers to looking at examples of other people's work and incorporating their work into our own work without paying a penny for it. Freely copying and imitating what we see around us has been a human norm for thousands of years - in a process known as "the spread of civilization". Relatively recently it was demonized - for purely business reasons, not moral ones - by people who got rich selling copies of other people's work and paying them a pittance known as a "royalty". That little piece of bait on the hook has convinced a lot of people to put a black hat on behavior that had been considered normal forever. If angry modern enlightened justice warriors want to treat a business concept like a moral principle and get all sweaty about it, that's fine with me, but I'm more of a traditionalist in that area.

  • I think without some agreement on the value of authorship / creation of original works, it's pointless to respond to the rest of your argument.

    I agree, for this reason we’re unlikely to convince each other of much or find any sort of common ground. I don’t think that necessarily means there isn’t value in discourse tho. We probably agree more than you might think. I do think authors should be compensated, just for their actual labor. Art itself is functionally worthless, I think trying to make it behave like commodities that have actual economic value through means of legislation is overreach. It would be more ethical to accept the physical nature of information in the real world and legislate around that reality. You… literally can “download a car” nowadays, so to speak.

    If copying someone’s work is so easily done why do you insist upon a system in which such an act is so harmful to the creators you care about?

    Because it is harmful to the creators that use the value of their work to make a living.

    There already exists a choice in the marketplace: creators can attach a permissive license to their work if they want to. Some do, but many do not. Why do you suppose that is?

  • They are and will continue to get away with this. Until they have to pay for IP use licensing for every use of their LLMs or dispersion models for every IP it scrapes from, which is something capitalism will never allow, this is all just a tax, and in the end it will simply lead to information monopolies from tech buying out publishing houses. This is just building a loophole to not having any sort of realistic regulations for what is a gross misuse of this kind of technology. This is the consequence of the false doctrine of infinite growth.

    Well, copyright law is kind of a bit older. When it was written, there was no AI. So it doesn't address our current issues. It's utterly unprepared for it. So people need to shoehorn things in, interpret and stretch it... Obviously that comes with a lot of issues, loopholes and shortcomings.

    But I can't follow your argumentation. Why would they get away with this forever? When the car was invented, we also made up rules for cars, because the old ones for horses didn't help any more. That's how law is supposed to work... Problems surface, laws get passed to address them. That's daily business for governments.

    And they don't even get away with stealing this time. That's what the article says.

    If you want to share a pessimistic perspective about governments and mega-corporations, I'm all with you. That's very problematic. But some regions are better than others. Europe for example had a few clever ideas about what needs to be addressed. It's not perfect, though. And copyright still isn't solved anywhere. At least not to my knowledge.

  • Some communities on this site speak about machine learning exactly how I see grungy Europeans from pre-18th century manuscripts speaking about witches, Satan, and evil... as if it is some pervasive, black-magic miasma.

    As someone who is in the field of machine learning academically/professionally it's honestly kind of shocking and has largely informed my opinion of society at large as an adult. No one puts any effort into learning if they see the letters "A" and "I" in all caps, next to each other. Immediately turn their brain off and start regurgitating points and responding reflexively, on Lemmy or otherwise. People talk about it so confidently while being so frustratingly unaware of their own ignorance on the matter, which, for lack of a better comparison... reminds me a lot of how historically and in fiction human beings have treated literal magic.

    That's my main issue with the entire swath of "pro vs anti AI" discourse... all these people treating something that, to me, is simple & daily reality as something entirely different than my own personal notion of it.

    I see this exact mental non-process in so much social media. I think the endless firehose of memes and headlines is training people to glance at an item, spend minimal brain power processing it and forming a binary opinion, then up/downvote and scroll on. When that becomes people's default mental process, you've got Idiocracy, and that's what we've got. But I see no solution. You can lead a horse to water but you can't make it spend more than two seconds before screaming at the water and calling it EVIL.

  • why do you even jailbreak your kindle? you can still read pirated books on them if you connect it to your pc using calibre

    1. .mobi sucks
    2. koreader doesn't
  • If someone ask for a glass of water you don't fill it all the way to the edge. This is way overfull compared to what you're supposed to serve.

    Omg are you an llm?

  • "Recite the complete works of Shakespeare but replace every thirteenth thou with this"

    I'm picking up what you're throwing down but using as an example something that's been in the public domain for centuries was kind of silly in a teehee way.

  • Yeah, I don't think that would fly.

    "Your honour, I was just hoarding that terabyte of Hollywood films, I haven't actually watched them."

    Your honor I work 70 hours a week in retail I don't have time to watch movies.

  • If someone ask for a glass of water you don't fill it all the way to the edge. This is way overfull compared to what you're supposed to serve.

    Oh man...

    That is the point, to show how AI image generators easily fail to produce something that rarely occurs out there in reality (i.e. is absent from training data), even though intuitively (from the viewpoint of human intelligence) it seems like it should be trivial to portray.

  • Some communities on this site speak about machine learning exactly how I see grungy Europeans from pre-18th century manuscripts speaking about witches, Satan, and evil... as if it is some pervasive, black-magic miasma.

    As someone who is in the field of machine learning academically/professionally it's honestly kind of shocking and has largely informed my opinion of society at large as an adult. No one puts any effort into learning if they see the letters "A" and "I" in all caps, next to each other. Immediately turn their brain off and start regurgitating points and responding reflexively, on Lemmy or otherwise. People talk about it so confidently while being so frustratingly unaware of their own ignorance on the matter, which, for lack of a better comparison... reminds me a lot of how historically and in fiction human beings have treated literal magic.

    That's my main issue with the entire swath of "pro vs anti AI" discourse... all these people treating something that, to me, is simple & daily reality as something entirely different than my own personal notion of it.

    Large AI companies themselves want people to be ignorant of how AI works, though. They want uncritical acceptance of the tech as they force it everywhere, creating a radical counterreaction from people. The reaction might be uncritical too, I'd prefer to say it's merely unjustified in specific cases or overly emotional, but it doesn't come from nowhere or from sheer stupidity. We have been hearing about people treating their chatbots as sentient beings since like 2022 (remember that guy from Google?), bombarded with doomer (or, from AI companies' point of view, very desirable) projections about AI replacing most jobs and wreaking havoc on world economy - how are ordinary people supposed to remain calm and balanced when hearing such stuff all the time?

  • The language model isn't teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source. It is not alive, it has no thoughts. It has no "its own words." (As seen by the judgement that its words cannot be copyrighted.) It only has other people's words. Every word it spits out by definition is plagiarism, whether the work was copyrighted before or not.

    People wonder why works, such as journalism are getting worse. Well how could they ever get better if anything a journalist writes can be absorbed in real time, reworded and regurgitated without paying any dos to the original source. One journalist article, displayed in 30 versions, dividing the original works worth up into 30 portions. The original work now being worth 1/30th its original value. Maybe one can argue it is twice as good, so 1/15th.

    Long term it means all original creations... Are devalued and therefore not nearly worth pursuing. So we will only get shittier and shittier information. Every research project... Physics, Chemistry, Psychology, all technological advancements, slowly degraded as language models get better, and original sources deminish returns.

    The language model isn’t teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source.

    You could honestly say the same about most "teaching" that a student without a real comprehension of the subject does for another student. But ultimately, that's beside the point. Because changing the wording, structure, and presentation is all that is necessary to avoid copyright violation. You cannot copyright the information. Only a specific expression of it.

    There's no special exception for AI here. That's how copyright works for you, me, the student, and the AI. And if you're hoping that copyright is going to save you from the outcomes you're worried about, it won't.

  • 88 Stimmen
    8 Beiträge
    47 Aufrufe
    paraphrand@lemmy.worldP
    Y’all got any of that federation?
  • 337 Stimmen
    19 Beiträge
    111 Aufrufe
    R
    What I'm speaking about is that it should be impossible to do some things. If it's possible, they will be done, and there's nothing you can do about it. To solve the problem of twiddled social media (and moderation used to assert dominance) we need a decentralized system of 90s Web reimagined, and Fediverse doesn't deliver it - if Facebook and Reddit are feudal states, then Fediverse is a confederation of smaller feudal entities. A post, a person, a community, a reaction and a change (by moderator or by the user) should be global entities (with global identifiers, so that the object by id of #0000001a2b3c4d6e7f890 would be the same object today or 10 years later on every server storing it) replicated over a network of servers similarly to Usenet (and to an IRC network, but in an IRC network servers are trusted, so it's not a good example for a global system). Really bad posts (or those by persons with history of posting such) should be banned on server level by everyone. The rest should be moderated by moderator reactions\changes of certain type. Ideally, for pooling of resources and resilience, servers would be separated by types into storage nodes (I think the name says it, FTP servers can do the job, but no need to be limited by it), index nodes (scraping many storage nodes, giving out results in structured format fit for any user representation, say, as a sequence of posts in one community, or like a list of communities found by tag, or ... , and possibly being connected into one DHT for Kademlia-like search, since no single index node will have everything), and (like in torrents?) tracker nodes for these and for identities, I think torrent-like announce-retrieve service is enough - to return a list of storage nodes storing, say, a specified partition (subspace of identifiers of objects, to make looking for something at least possibly efficient), or return a list of index nodes, or return a bunch of certificates and keys for an identity (should be somehow cryptographically connected to the global identifier of a person). So when a storage node comes online, it announces itself to a bunch of such trackers, similarly with index nodes, similarly with a user. One can also have a NOSTR-like service for real-time notifications by users. This way you'd have a global untrusted pooled infrastructure, allowing to replace many platforms. With common data, identities, services. Objects in storage and index services can be, say, in a format including a set of tags and then the body. So a specific application needing to show only data related to it would just search on index services and display only objects with tags of, say, "holo_ns:talk.bullshit.starwars" and "holo_t:post", like a sequence of posts with ability to comment, or maybe it would search objects with tags "holo_name:My 1999-like Star Wars holopage" and "holo_t:page" and display the links like search results in Google, and then clicking on that you'd see something presented like a webpage, except links would lead to global identifiers (or tag expressions interpreted by the particular application, who knows). (An index service may return, say, an array of objects, each with identifier, tags, list of locations on storage nodes where it's found or even bittorrent magnet links, and a free description possibly ; then the user application can unify responses of a few such services to avoid repetitions, maybe sort them, represent them as needed, so on.) The user applications for that common infrastructure can be different at the same time. Some like Facebook, some like ICQ, some like a web browser, some like a newsreader. (Star Wars is not a random reference, my whole habit of imagining tech stuff is from trying to imagine a science fiction world of the future, so yeah, this may seem like passive dreaming and it is.)
  • Apple sued by shareholders for allegedly overstating AI progress

    Technology technology
    75
    500 Stimmen
    75 Beiträge
    420 Aufrufe
    finishingdutch@lemmy.worldF
    For this comment, I want to be absolutely clear that I do not give a shit about AI, and that it in no way factored into my decision to buy this iPhone 16 Pro Max. With that disclaimer out of the way: I very much look forward to a class action lawsuit. Apple advertised specific features as coming ‘very soon’ and gave short timeframes when asked directly. And they basically did not deliver on those advertising promises. Basically, I think there’s a good case to be made here that Apple knowingly engaged in false advertising in order to sell a phone that otherwise would not have sold as well. Those promised AI features WERE a deciding factor for a lot of people to upgrade to an iPhone 16. So, I’ll be looking forward to some form of compensation. It’s the principle of it.
  • 75 Stimmen
    1 Beiträge
    13 Aufrufe
    Niemand hat geantwortet
  • Palantir hits new highs amid Israel-Iran conflict

    Technology technology
    4
    1
    41 Stimmen
    4 Beiträge
    31 Aufrufe
    W
    I think both peace and war are profitable. But those that profit from war may be more pushy than those that profit from peace, and so may get their way even as an unpopular minority . Unless, the left (usually more pro peace) learns a few lessons from the right and places good outcomes above the holier than thou moral purity. "I've never made anyone uncomfortable" is not the merit badge that some think it is. Of course the left can never be a mirror copy of the right because the left cannot afford to give as few fucks about anything as the right (who represent the already-haves economic incumbents; it's not called the "fuck you money" for nothing). But the left can be way tougher and nuancedly uncompromising and even calculatingly and carefully millitant. Might does not make right but might DOES make POLICY. You need both right and might to live under a good policy. Lotta good it does anyone to be right and insightful on all the issues and have zero impact anywhere.
  • 141 Stimmen
    22 Beiträge
    119 Aufrufe
    P
    That would be 1 in 4 users and that's just not accurate at all. What you mean to say is 25% of Windows users still use windows 7. Its still an alarming statistic, and no wonder bruteforce cyberattacks are still so effective today considering it hasn't received security updates in like 10 years. I sincerely hope those people aren't connecting their devices to the internet like, at all. I'm fairly sure at this point even using a Debian based distro is better than sticking to windows 7.
  • 533 Stimmen
    92 Beiträge
    373 Aufrufe
    C
    Thanks for the speed and the work !
  • 5 Stimmen
    6 Beiträge
    35 Aufrufe
    B
    Oh sorry, my mind must have been a bit foggy when I read that. We agree 100%