Skip to content

Judge dismisses authors' copyright lawsuit against Meta over AI training

Technology
24 14 126
  • This post did not contain any content.
  • This post did not contain any content.

    This is the notorious lawsuit from a year ago:

    a group of well-known writers that includes comedian Sarah Silverman and authors Jacqueline Woodson and Ta-Nehisi Coates

    The judge enforces that AI training is fair use:

    But the actual process of an AI system distilling from thousands of written works to be able to produce its own passages of text qualified as “fair use” under U.S. copyright law because it was “quintessentially transformative,” Alsup wrote.

    This is a second judgement of this type this week.

  • This post did not contain any content.

    Bad judgement.

  • This is the notorious lawsuit from a year ago:

    a group of well-known writers that includes comedian Sarah Silverman and authors Jacqueline Woodson and Ta-Nehisi Coates

    The judge enforces that AI training is fair use:

    But the actual process of an AI system distilling from thousands of written works to be able to produce its own passages of text qualified as “fair use” under U.S. copyright law because it was “quintessentially transformative,” Alsup wrote.

    This is a second judgement of this type this week.

    Alsup? Is this the same judge who also presided over Oracle v. Google over the use of Java in Android? That guy really does his homework over cases he presides on, he learned how to code to see if APIs are copyrightable.

    As for the ruling, I'm not in favour of AI training on copyrighted material, but I can see where the judgement is coming from. I think it's a matter of what's really copyrightable: the actual text or images or the abstract knowledge in the material. In other words, if you were to read a book and then write a summary of a section of it in your own words or orally described what you learned from the book to someone else, does that mean copyright infringement? Or if you watch a movie and then describe your favourite scenes to your friends?

    Perhaps a case could be made that AI training on copyrighted materials is not the same as humans consuming the copyrighted material and therefore it should have a different provision in copyright law. I'm no lawyer, but I'd assume that current copyright law works on the basis that humans do not generally have perfect recall of the copyrighted material they consume. But then again a counter argument could be that neither does the AI due to its tendency to hallucinate sometimes. However, it still has superior recall compared to humans and perhaps could be the grounds for amending copyright law about AI training?

  • This post did not contain any content.

    Terrible judgement.

    Turn the K value down on the model and it reproduces text near verbatim.

  • Alsup? Is this the same judge who also presided over Oracle v. Google over the use of Java in Android? That guy really does his homework over cases he presides on, he learned how to code to see if APIs are copyrightable.

    As for the ruling, I'm not in favour of AI training on copyrighted material, but I can see where the judgement is coming from. I think it's a matter of what's really copyrightable: the actual text or images or the abstract knowledge in the material. In other words, if you were to read a book and then write a summary of a section of it in your own words or orally described what you learned from the book to someone else, does that mean copyright infringement? Or if you watch a movie and then describe your favourite scenes to your friends?

    Perhaps a case could be made that AI training on copyrighted materials is not the same as humans consuming the copyrighted material and therefore it should have a different provision in copyright law. I'm no lawyer, but I'd assume that current copyright law works on the basis that humans do not generally have perfect recall of the copyrighted material they consume. But then again a counter argument could be that neither does the AI due to its tendency to hallucinate sometimes. However, it still has superior recall compared to humans and perhaps could be the grounds for amending copyright law about AI training?

    Your last paragraph would be ideal solution in ideal world but I don't think ever like this could happen in the current political and economical structures.

    First its super easy to hide all of this and enforcement would be very difficult even domestically. Second, because we're in AI race no one would ever put themselves in such disadvantage unless its real damage not economical copyright juggling.

    People need to come to terms with these facts so we can address real problems rather than blow against the wind with all this whining we see on Lemmy. There are actual things we can do.

  • Terrible judgement.

    Turn the K value down on the model and it reproduces text near verbatim.

    Ah the Schrödinger's LLM - always hallucinating and also always accurate

  • Alsup? Is this the same judge who also presided over Oracle v. Google over the use of Java in Android? That guy really does his homework over cases he presides on, he learned how to code to see if APIs are copyrightable.

    As for the ruling, I'm not in favour of AI training on copyrighted material, but I can see where the judgement is coming from. I think it's a matter of what's really copyrightable: the actual text or images or the abstract knowledge in the material. In other words, if you were to read a book and then write a summary of a section of it in your own words or orally described what you learned from the book to someone else, does that mean copyright infringement? Or if you watch a movie and then describe your favourite scenes to your friends?

    Perhaps a case could be made that AI training on copyrighted materials is not the same as humans consuming the copyrighted material and therefore it should have a different provision in copyright law. I'm no lawyer, but I'd assume that current copyright law works on the basis that humans do not generally have perfect recall of the copyrighted material they consume. But then again a counter argument could be that neither does the AI due to its tendency to hallucinate sometimes. However, it still has superior recall compared to humans and perhaps could be the grounds for amending copyright law about AI training?

    Acree 100%

    Hope we can refactor this whole copyright/patent concept soon..

    It is more a pain for artists, creators, releasers etc.

    I see it with EDM, I work as a Label, and do sometimes produce a bit

    Most artists will work with samples and presets etc. And keeping track of who worked on what and who owns how much percent of what etc. just takes the joy out of creating..

    Same for game design: You have a vision for your game, make a poc, and then have to change the whole game because of stupid patent shit not allowing you e.g. not land on a horse and immediately ride it, or throwing stuff at things to catch them…

  • Acree 100%

    Hope we can refactor this whole copyright/patent concept soon..

    It is more a pain for artists, creators, releasers etc.

    I see it with EDM, I work as a Label, and do sometimes produce a bit

    Most artists will work with samples and presets etc. And keeping track of who worked on what and who owns how much percent of what etc. just takes the joy out of creating..

    Same for game design: You have a vision for your game, make a poc, and then have to change the whole game because of stupid patent shit not allowing you e.g. not land on a horse and immediately ride it, or throwing stuff at things to catch them…

    I'm inclined to agree. I hate AI, and I especially hate artists and other creatives being shafted, but I'm increasingly doubtful that copyright is an effective way to ensure that they get their fair share (whether we're talking about AI or otherwise).

  • Bad judgement.

    Any reason to say that other than that it didn't give the result you wanted?

  • I'm inclined to agree. I hate AI, and I especially hate artists and other creatives being shafted, but I'm increasingly doubtful that copyright is an effective way to ensure that they get their fair share (whether we're talking about AI or otherwise).

    In an ideal world, there would be something like a universal basic income, which would reduce the pressure on artists that they have to generate enough income with their art, this would allow artists to make art less for mainstream but more unique and thus would, in my opinion, allow to weaken copyright laws

    Well, that would be the way I would try to start change.

  • Ah the Schrödinger's LLM - always hallucinating and also always accurate

    "hallucination refers to the generation of plausible-sounding but factually incorrect or nonsensical information"

    Is an output an hallucination when the training data involved in that output included factually incorrect data? Suppose my input is "is the would flat" and then an LLM, allegedly, accurately generates a flat-eather's writings saying it is.

  • Your last paragraph would be ideal solution in ideal world but I don't think ever like this could happen in the current political and economical structures.

    First its super easy to hide all of this and enforcement would be very difficult even domestically. Second, because we're in AI race no one would ever put themselves in such disadvantage unless its real damage not economical copyright juggling.

    People need to come to terms with these facts so we can address real problems rather than blow against the wind with all this whining we see on Lemmy. There are actual things we can do.

    One way I could see this being enforced is by mandating that AI models not respond to questions that could result in speaking about a copyrighted work. Similar to how mainstream models don't speak about vulgar or controversial topics.

    But yeah, realistically, it's unlikely that any judge would rule in that favour.

  • This post did not contain any content.

    It sounds like the precedent has been set

  • This post did not contain any content.

    Grab em by the intellectual property! When you're a multi-billion dollar corporation, they just let you do it!

  • This post did not contain any content.

    I’ll leave this here from another post on this topic…

  • Ah the Schrödinger's LLM - always hallucinating and also always accurate

    Accuracy and hallucination are two ends of a spectrum.

    If you turn hallucinations to a minimum, the LLM will faithfully reproduce what's in the training set, but the result will not fit the query very well.

    The other option is to turn the so-called temperature up, which will result in replies fitting better to the query but also the hallucinations go up.

    In the end it's a balance between getting responses that are closer to the dataset (factual) or closer to the query (creative).

  • Ah the Schrödinger's LLM - always hallucinating and also always accurate

    There is nothing intelligent about "AI" as we call it. It parrots based on probability. If you remove the randomness value from the model, it parrots the same thing every time based on it's weights, and if the weights were trained on Harry Potter, it will consistently give you giant chunks of harry potter verbatim when prompted.

    Most of the LLM services attempt to avoid this by adding arbitrary randomness values to churn the soup. But this is also inherently part of the cause of hallucinations, as the model cannot preserve a single correct response as always the right way to respond to a certain query.

    LLMs are insanely "dumb", they're just lightspeed parrots. The fact that Meta and these other giant tech companies claim it's not theft because they sprinkle in some randomness is just obscuring the reality and the fact that their models are derivative of the work of organizations like the BBC and Wikipedia, while also dependent on the works of tens of thousands of authors to develop their corpus of language.

    In short, there was a ethical way to train these models. But that would have been slower. And the court just basically gave them a pass on theft. Facebook would have been entirely in the clear had it not stored the books in a dataset, which in itself is insane.

    I wish I knew when I was younger that stealing is wrong, unless you steal at scale. Then it's just clever business.

  • There is nothing intelligent about "AI" as we call it. It parrots based on probability. If you remove the randomness value from the model, it parrots the same thing every time based on it's weights, and if the weights were trained on Harry Potter, it will consistently give you giant chunks of harry potter verbatim when prompted.

    Most of the LLM services attempt to avoid this by adding arbitrary randomness values to churn the soup. But this is also inherently part of the cause of hallucinations, as the model cannot preserve a single correct response as always the right way to respond to a certain query.

    LLMs are insanely "dumb", they're just lightspeed parrots. The fact that Meta and these other giant tech companies claim it's not theft because they sprinkle in some randomness is just obscuring the reality and the fact that their models are derivative of the work of organizations like the BBC and Wikipedia, while also dependent on the works of tens of thousands of authors to develop their corpus of language.

    In short, there was a ethical way to train these models. But that would have been slower. And the court just basically gave them a pass on theft. Facebook would have been entirely in the clear had it not stored the books in a dataset, which in itself is insane.

    I wish I knew when I was younger that stealing is wrong, unless you steal at scale. Then it's just clever business.

    Except that breaking copyright is not stealing and never was. Hard to believe that you'd ever see Copyright advocates on foss and decentralized networks like Lemmy - its like people had their minds hijacked because "big tech is bad".

  • Except that breaking copyright is not stealing and never was. Hard to believe that you'd ever see Copyright advocates on foss and decentralized networks like Lemmy - its like people had their minds hijacked because "big tech is bad".

    What name do you have for the activity of making money using someone else work or data, without their consent or giving compensation? If the tech was just tech, it wouldn't need any non consenting human input for it to work properly. This are just companies feeding on various types of data, if justice doesn't protects an author, what do you think it would happen if these same models started feeding of user data instead? Tech is good, ethics are not

  • We're Not Innovating, We’re Just Forgetting Slower

    Technology technology
    39
    1
    287 Stimmen
    39 Beiträge
    0 Aufrufe
    P
    Gotcha, thank you for the extra context so I understand your point. I'll respond to your original statement now that I understand it better: I ALSO think the author would prefer more broad technical literacy, but his core arguement seemed to be that those making things dont understand the tech they’re built upon and that unintended consequences can occur when that happens. I think the author's argument on that is also not a great one. Lets take your web app example. As you said, you can make the app, but you don't understand the memory allocation, and why? Because the high level language or framework you wrote it in does memory management and garbage collection. However, there are many, many, MANY, more layers of abstraction beside just your code and the interpreter. Do you know the webserver front to back? Do you know which ring your app or the web server is operating in inside the OS (ring 3 BTW)? Do you know how the IP stack works in the server? Do you know how the networking works that resolves names to IP addresses or routes the traffic appropriately? Do you know how the firewalls work that the traffic is going over when it leaves the server? Back on the server, do you know how the operating system makes calls to the hardware via device drivers (ring 1) or how those calls are handled by the OS kernel (ring 0)? Do you know how the system bus works on the motherboard or how the L1, L2, and L3 cache affect the operation and performance of the server overall? How about that assembly language isn't even the bottom of abstraction? Below that all of this data is merely an abstraction of binary, which is really just the presence or absence of voltage on a pit or in a bit register in ICs scattered across the system? I'll say probably not. And thats just fine! Why? Because unless your web app is going to be loaded onto a spacecraft with a 20 to 40 year life span and you'll never be able to touch it again, then having all of that extra knowledge and understanding only have slight impacts on the web app for its entire life. Once you get one or maybe two levels of abstraction down, the knowledge is a novelty not a requirement. There's also exceptions to this if you're writing software for embedded systems where you have limited system resources, but again, this is an edge case that very very few people will ever need to worry about. The people in those generally professions do have the deep understanding of those platforms they're responsible for. Focus on your web app. Make sure its solving the problem that it was written to solve. Yes, you might need to dive a bit deeper to eek out some performance, but that comes with time and experience anyway. The author talks like even the most novice people need the ultimately deep understanding through all layers of abstraction. I think that is too much of a burden, especially when it acts as a barrier to people being able to jump in and use the technology to solve problems. Perhaps the best example of the world that I think the author wants would be the 1960s Apollo program. This was a time where the pinnacle of technology was being deployed in real-time to solve world moving problems. Human kind was trying to land on the moon! The most heroic optimization of machines and procedures had to be accomplished for even a chance for this to go right. The best of the best had to know every. little. thing. about. everything. People's lives were at stake! National pride was at stake! Failure was NOT an option! All of that speaks to more of what the author wants for everyone today. However, that's trying to solve a problem that doesn't exist today. Compute power today is CHEAP!!! High level program languages and frameworks are so easy to understand that programming it is accessible to everyone with a device and a desire to use it. We're not going to the moon with this. Its the kid down the block that figured out how to use If This Then That to make a light bulb turn on when he farts into a microphone. The beauty is the accessibility. The democratization of compute. We don't need gatekeepers demanding the deepest commitment to understanding before the primitive humans are allowed to use fire. Are there going to be problems or things that don't work? Yes. Will the net benefit of cheap and readily available compute in the hands of everyone be greater than the detriments, I believe yes. It appears the author disagrees with me. /sorry for the wall of text
  • 88 Stimmen
    8 Beiträge
    46 Aufrufe
    paraphrand@lemmy.worldP
    Y’all got any of that federation?
  • 31 Stimmen
    1 Beiträge
    9 Aufrufe
    Niemand hat geantwortet
  • International Criminal Court hit with "sophisticated" cyberattack

    Technology technology
    3
    6 Stimmen
    3 Beiträge
    27 Aufrufe
    M
    A real mystery indeed.
  • Microsoft’s new genAI model to power agents in Windows 11

    Technology technology
    12
    1
    30 Stimmen
    12 Beiträge
    66 Aufrufe
    ulrich@feddit.orgU
    which one would sell more I mean they would charge a lot of money for the stripped down one because it doesn't allow them to monetize it on the back end, and the vast majority would continue using the resource-slurping ad-riddled one.
  • 61 Stimmen
    17 Beiträge
    85 Aufrufe
    anzo@programming.devA
    I’ll probably never trust anything they’ve touched until I’ve taken it apart and put it back together again. Me too. But the vast majority of users need guardrails, and have a different threat model. Even those that also care about privacy, if they just want a solution that comes by default, this adtech 'fake' or 'superficial' solution does provide something. And anything is more than nothing.
  • Spyware and state abuse: The case for an EU-wide ban

    Technology technology
    2
    1
    54 Stimmen
    2 Beiträge
    22 Aufrufe
    M
    I'm surprised it isn't already illegal to install software on someone's phone without their consent or knowledge. Sounds like a form of property damage.
  • Open Source CAD In The Browser

    Technology technology
    19
    1
    152 Stimmen
    19 Beiträge
    96 Aufrufe
    xavier666@lemm.eeX
    Electron: Heyyyyyyy