Skip to content

Judge dismisses authors' copyright lawsuit against Meta over AI training

Technology
24 14 337
  • This post did not contain any content.
  • This post did not contain any content.

    This is the notorious lawsuit from a year ago:

    a group of well-known writers that includes comedian Sarah Silverman and authors Jacqueline Woodson and Ta-Nehisi Coates

    The judge enforces that AI training is fair use:

    But the actual process of an AI system distilling from thousands of written works to be able to produce its own passages of text qualified as “fair use” under U.S. copyright law because it was “quintessentially transformative,” Alsup wrote.

    This is a second judgement of this type this week.

  • This post did not contain any content.

    Bad judgement.

  • This is the notorious lawsuit from a year ago:

    a group of well-known writers that includes comedian Sarah Silverman and authors Jacqueline Woodson and Ta-Nehisi Coates

    The judge enforces that AI training is fair use:

    But the actual process of an AI system distilling from thousands of written works to be able to produce its own passages of text qualified as “fair use” under U.S. copyright law because it was “quintessentially transformative,” Alsup wrote.

    This is a second judgement of this type this week.

    Alsup? Is this the same judge who also presided over Oracle v. Google over the use of Java in Android? That guy really does his homework over cases he presides on, he learned how to code to see if APIs are copyrightable.

    As for the ruling, I'm not in favour of AI training on copyrighted material, but I can see where the judgement is coming from. I think it's a matter of what's really copyrightable: the actual text or images or the abstract knowledge in the material. In other words, if you were to read a book and then write a summary of a section of it in your own words or orally described what you learned from the book to someone else, does that mean copyright infringement? Or if you watch a movie and then describe your favourite scenes to your friends?

    Perhaps a case could be made that AI training on copyrighted materials is not the same as humans consuming the copyrighted material and therefore it should have a different provision in copyright law. I'm no lawyer, but I'd assume that current copyright law works on the basis that humans do not generally have perfect recall of the copyrighted material they consume. But then again a counter argument could be that neither does the AI due to its tendency to hallucinate sometimes. However, it still has superior recall compared to humans and perhaps could be the grounds for amending copyright law about AI training?

  • This post did not contain any content.

    Terrible judgement.

    Turn the K value down on the model and it reproduces text near verbatim.

  • Alsup? Is this the same judge who also presided over Oracle v. Google over the use of Java in Android? That guy really does his homework over cases he presides on, he learned how to code to see if APIs are copyrightable.

    As for the ruling, I'm not in favour of AI training on copyrighted material, but I can see where the judgement is coming from. I think it's a matter of what's really copyrightable: the actual text or images or the abstract knowledge in the material. In other words, if you were to read a book and then write a summary of a section of it in your own words or orally described what you learned from the book to someone else, does that mean copyright infringement? Or if you watch a movie and then describe your favourite scenes to your friends?

    Perhaps a case could be made that AI training on copyrighted materials is not the same as humans consuming the copyrighted material and therefore it should have a different provision in copyright law. I'm no lawyer, but I'd assume that current copyright law works on the basis that humans do not generally have perfect recall of the copyrighted material they consume. But then again a counter argument could be that neither does the AI due to its tendency to hallucinate sometimes. However, it still has superior recall compared to humans and perhaps could be the grounds for amending copyright law about AI training?

    Your last paragraph would be ideal solution in ideal world but I don't think ever like this could happen in the current political and economical structures.

    First its super easy to hide all of this and enforcement would be very difficult even domestically. Second, because we're in AI race no one would ever put themselves in such disadvantage unless its real damage not economical copyright juggling.

    People need to come to terms with these facts so we can address real problems rather than blow against the wind with all this whining we see on Lemmy. There are actual things we can do.

  • Terrible judgement.

    Turn the K value down on the model and it reproduces text near verbatim.

    Ah the Schrödinger's LLM - always hallucinating and also always accurate

  • Alsup? Is this the same judge who also presided over Oracle v. Google over the use of Java in Android? That guy really does his homework over cases he presides on, he learned how to code to see if APIs are copyrightable.

    As for the ruling, I'm not in favour of AI training on copyrighted material, but I can see where the judgement is coming from. I think it's a matter of what's really copyrightable: the actual text or images or the abstract knowledge in the material. In other words, if you were to read a book and then write a summary of a section of it in your own words or orally described what you learned from the book to someone else, does that mean copyright infringement? Or if you watch a movie and then describe your favourite scenes to your friends?

    Perhaps a case could be made that AI training on copyrighted materials is not the same as humans consuming the copyrighted material and therefore it should have a different provision in copyright law. I'm no lawyer, but I'd assume that current copyright law works on the basis that humans do not generally have perfect recall of the copyrighted material they consume. But then again a counter argument could be that neither does the AI due to its tendency to hallucinate sometimes. However, it still has superior recall compared to humans and perhaps could be the grounds for amending copyright law about AI training?

    Acree 100%

    Hope we can refactor this whole copyright/patent concept soon..

    It is more a pain for artists, creators, releasers etc.

    I see it with EDM, I work as a Label, and do sometimes produce a bit

    Most artists will work with samples and presets etc. And keeping track of who worked on what and who owns how much percent of what etc. just takes the joy out of creating..

    Same for game design: You have a vision for your game, make a poc, and then have to change the whole game because of stupid patent shit not allowing you e.g. not land on a horse and immediately ride it, or throwing stuff at things to catch them…

  • Acree 100%

    Hope we can refactor this whole copyright/patent concept soon..

    It is more a pain for artists, creators, releasers etc.

    I see it with EDM, I work as a Label, and do sometimes produce a bit

    Most artists will work with samples and presets etc. And keeping track of who worked on what and who owns how much percent of what etc. just takes the joy out of creating..

    Same for game design: You have a vision for your game, make a poc, and then have to change the whole game because of stupid patent shit not allowing you e.g. not land on a horse and immediately ride it, or throwing stuff at things to catch them…

    I'm inclined to agree. I hate AI, and I especially hate artists and other creatives being shafted, but I'm increasingly doubtful that copyright is an effective way to ensure that they get their fair share (whether we're talking about AI or otherwise).

  • Bad judgement.

    Any reason to say that other than that it didn't give the result you wanted?

  • I'm inclined to agree. I hate AI, and I especially hate artists and other creatives being shafted, but I'm increasingly doubtful that copyright is an effective way to ensure that they get their fair share (whether we're talking about AI or otherwise).

    In an ideal world, there would be something like a universal basic income, which would reduce the pressure on artists that they have to generate enough income with their art, this would allow artists to make art less for mainstream but more unique and thus would, in my opinion, allow to weaken copyright laws

    Well, that would be the way I would try to start change.

  • Ah the Schrödinger's LLM - always hallucinating and also always accurate

    "hallucination refers to the generation of plausible-sounding but factually incorrect or nonsensical information"

    Is an output an hallucination when the training data involved in that output included factually incorrect data? Suppose my input is "is the would flat" and then an LLM, allegedly, accurately generates a flat-eather's writings saying it is.

  • Your last paragraph would be ideal solution in ideal world but I don't think ever like this could happen in the current political and economical structures.

    First its super easy to hide all of this and enforcement would be very difficult even domestically. Second, because we're in AI race no one would ever put themselves in such disadvantage unless its real damage not economical copyright juggling.

    People need to come to terms with these facts so we can address real problems rather than blow against the wind with all this whining we see on Lemmy. There are actual things we can do.

    One way I could see this being enforced is by mandating that AI models not respond to questions that could result in speaking about a copyrighted work. Similar to how mainstream models don't speak about vulgar or controversial topics.

    But yeah, realistically, it's unlikely that any judge would rule in that favour.

  • This post did not contain any content.

    It sounds like the precedent has been set

  • This post did not contain any content.

    Grab em by the intellectual property! When you're a multi-billion dollar corporation, they just let you do it!

  • This post did not contain any content.

    I’ll leave this here from another post on this topic…

  • Ah the Schrödinger's LLM - always hallucinating and also always accurate

    Accuracy and hallucination are two ends of a spectrum.

    If you turn hallucinations to a minimum, the LLM will faithfully reproduce what's in the training set, but the result will not fit the query very well.

    The other option is to turn the so-called temperature up, which will result in replies fitting better to the query but also the hallucinations go up.

    In the end it's a balance between getting responses that are closer to the dataset (factual) or closer to the query (creative).

  • Ah the Schrödinger's LLM - always hallucinating and also always accurate

    There is nothing intelligent about "AI" as we call it. It parrots based on probability. If you remove the randomness value from the model, it parrots the same thing every time based on it's weights, and if the weights were trained on Harry Potter, it will consistently give you giant chunks of harry potter verbatim when prompted.

    Most of the LLM services attempt to avoid this by adding arbitrary randomness values to churn the soup. But this is also inherently part of the cause of hallucinations, as the model cannot preserve a single correct response as always the right way to respond to a certain query.

    LLMs are insanely "dumb", they're just lightspeed parrots. The fact that Meta and these other giant tech companies claim it's not theft because they sprinkle in some randomness is just obscuring the reality and the fact that their models are derivative of the work of organizations like the BBC and Wikipedia, while also dependent on the works of tens of thousands of authors to develop their corpus of language.

    In short, there was a ethical way to train these models. But that would have been slower. And the court just basically gave them a pass on theft. Facebook would have been entirely in the clear had it not stored the books in a dataset, which in itself is insane.

    I wish I knew when I was younger that stealing is wrong, unless you steal at scale. Then it's just clever business.

  • There is nothing intelligent about "AI" as we call it. It parrots based on probability. If you remove the randomness value from the model, it parrots the same thing every time based on it's weights, and if the weights were trained on Harry Potter, it will consistently give you giant chunks of harry potter verbatim when prompted.

    Most of the LLM services attempt to avoid this by adding arbitrary randomness values to churn the soup. But this is also inherently part of the cause of hallucinations, as the model cannot preserve a single correct response as always the right way to respond to a certain query.

    LLMs are insanely "dumb", they're just lightspeed parrots. The fact that Meta and these other giant tech companies claim it's not theft because they sprinkle in some randomness is just obscuring the reality and the fact that their models are derivative of the work of organizations like the BBC and Wikipedia, while also dependent on the works of tens of thousands of authors to develop their corpus of language.

    In short, there was a ethical way to train these models. But that would have been slower. And the court just basically gave them a pass on theft. Facebook would have been entirely in the clear had it not stored the books in a dataset, which in itself is insane.

    I wish I knew when I was younger that stealing is wrong, unless you steal at scale. Then it's just clever business.

    Except that breaking copyright is not stealing and never was. Hard to believe that you'd ever see Copyright advocates on foss and decentralized networks like Lemmy - its like people had their minds hijacked because "big tech is bad".

  • Except that breaking copyright is not stealing and never was. Hard to believe that you'd ever see Copyright advocates on foss and decentralized networks like Lemmy - its like people had their minds hijacked because "big tech is bad".

    What name do you have for the activity of making money using someone else work or data, without their consent or giving compensation? If the tech was just tech, it wouldn't need any non consenting human input for it to work properly. This are just companies feeding on various types of data, if justice doesn't protects an author, what do you think it would happen if these same models started feeding of user data instead? Tech is good, ethics are not

  • 232 Stimmen
    71 Beiträge
    129 Aufrufe
    S
    So while Utah punches above its weight in tech, St. Paul area absolutely dwarfs it in population. Surely they have a robust cybersecurity industry there... https://lecbyo.files.cmp.optimizely.com/download/fa9be256b74111efa0ca8e42e80f1a8f?sfvrsn=a8aa5246_2 Utah, #1 projected tech sector growth in the next decade, of all 50 states. Utah, #8 for tech sector % of entire state economy, of all 50 states. Minnesota? Doesn't crack top 10 for any metrics. Utah may not be the biggest or techiest state, but it is way more so than Minnesota. The National Guard just seems like a desperate move. Again, this is my argument, but you are only seeing desperation as due to incompetence, not due to... actual severity. When they're deployed, they take orders from the the federal military, Not actually true unless the Nat Guard has been given a direct command by the Pentagon. and at peace, monitoring foreign threats seems like a federal thing. ... which is why the FBI were called in, in addition to the Nat Guard being able to report up the military CoC. You call in the National Guard to put down a riot or something where you just need bodies, not for anything niche. I mean, you yourself have explained that the Nat Guard does have a CyberSec ability, and I've explained they also have the ability to potentially summon even greater CyberSec ability. I guess you would be surprised how involved the military is / can be in defending against national security threatening, critical infrastructure comprimising kinds of domestic threats. Remember Stuxnet? Yeah other people can do that to us now, we kinda uncorked the genie bottle on that one. Otherwise, just call a local cybersecurity firm to trace the attack and assess damage. It is not everyone's instinct or best practice to immediately hire a contracted firm to do things that government agencies can, and have a responsibility to do. If this was like, Amazon being comprimised, yeah I can see that being a more likely avenue, though if it was serious, they'd probably call in some or multiple forms of 'the Feds' as well. But this was a breach/compromise of a municipal network... thats a government thing. Not a private sector thing. EDIT: Also, you are acting like either you are unaware of the following, or ... don't think its real? https://en.wikipedia.org/wiki/Utah_Data_Center Kind of a really big deal in terms of Utah and the tech sector and the Federal government and... things that were totally illegal before the PATRIOT Act. Exabytes of storage. Exabytes. Utah literally is where the NSA is doing their damndest to make a hardcopy of literally all internet traffic and content. Given how classified this facility is, I wouldn't be surprised if their employees don't exactly show up in standard Utah employment figures.
  • 200 Stimmen
    32 Beiträge
    414 Aufrufe
    E
    Jesus I can't think of anything I would want less than a Teams metaverse. Although I do have a macabre fascination as to how they could make the product even worse.
  • Meta Takes Hard Line Against Europe's AI Rules

    Technology technology
    19
    1
    92 Stimmen
    19 Beiträge
    235 Aufrufe
    F
    One part of this is jurisdiction. I'm being very simplistic here and only have a vague sense of the picture, really (my own prejudice - I find just about everything about meta abhorrent) They are based in a country that's solely oritentated towards liberty - not fairness or common sense. There are other parts, of course, like lobbying, tax breaks and so on, but a big part is because they're not based in the EU.
  • 75 Stimmen
    2 Beiträge
    35 Aufrufe
    nkat2112@sh.itjust.worksN
    This is beautiful - and a noble service for humanity. Thank you for posting this, OP!
  • 15 Stimmen
    2 Beiträge
    35 Aufrufe
    H
    No article to see here.
  • 161 Stimmen
    22 Beiträge
    355 Aufrufe
    presidentcamacho@lemmy.caP
    It costs a million, but you cum billions
  • 30 Stimmen
    2 Beiträge
    34 Aufrufe
    captainastronaut@seattlelunarsociety.orgC
    If you had asked me during the Obama administration I would have said this a chance of becoming law. Today I give it 0.002%.
  • 1k Stimmen
    78 Beiträge
    678 Aufrufe
    K
    I just hear that they move to LibreOffice but not to Linux, ateast not right now.