Skip to content

Judge dismisses authors' copyright lawsuit against Meta over AI training

Technology
24 14 126
  • This post did not contain any content.
  • This post did not contain any content.

    This is the notorious lawsuit from a year ago:

    a group of well-known writers that includes comedian Sarah Silverman and authors Jacqueline Woodson and Ta-Nehisi Coates

    The judge enforces that AI training is fair use:

    But the actual process of an AI system distilling from thousands of written works to be able to produce its own passages of text qualified as “fair use” under U.S. copyright law because it was “quintessentially transformative,” Alsup wrote.

    This is a second judgement of this type this week.

  • This post did not contain any content.

    Bad judgement.

  • This is the notorious lawsuit from a year ago:

    a group of well-known writers that includes comedian Sarah Silverman and authors Jacqueline Woodson and Ta-Nehisi Coates

    The judge enforces that AI training is fair use:

    But the actual process of an AI system distilling from thousands of written works to be able to produce its own passages of text qualified as “fair use” under U.S. copyright law because it was “quintessentially transformative,” Alsup wrote.

    This is a second judgement of this type this week.

    Alsup? Is this the same judge who also presided over Oracle v. Google over the use of Java in Android? That guy really does his homework over cases he presides on, he learned how to code to see if APIs are copyrightable.

    As for the ruling, I'm not in favour of AI training on copyrighted material, but I can see where the judgement is coming from. I think it's a matter of what's really copyrightable: the actual text or images or the abstract knowledge in the material. In other words, if you were to read a book and then write a summary of a section of it in your own words or orally described what you learned from the book to someone else, does that mean copyright infringement? Or if you watch a movie and then describe your favourite scenes to your friends?

    Perhaps a case could be made that AI training on copyrighted materials is not the same as humans consuming the copyrighted material and therefore it should have a different provision in copyright law. I'm no lawyer, but I'd assume that current copyright law works on the basis that humans do not generally have perfect recall of the copyrighted material they consume. But then again a counter argument could be that neither does the AI due to its tendency to hallucinate sometimes. However, it still has superior recall compared to humans and perhaps could be the grounds for amending copyright law about AI training?

  • This post did not contain any content.

    Terrible judgement.

    Turn the K value down on the model and it reproduces text near verbatim.

  • Alsup? Is this the same judge who also presided over Oracle v. Google over the use of Java in Android? That guy really does his homework over cases he presides on, he learned how to code to see if APIs are copyrightable.

    As for the ruling, I'm not in favour of AI training on copyrighted material, but I can see where the judgement is coming from. I think it's a matter of what's really copyrightable: the actual text or images or the abstract knowledge in the material. In other words, if you were to read a book and then write a summary of a section of it in your own words or orally described what you learned from the book to someone else, does that mean copyright infringement? Or if you watch a movie and then describe your favourite scenes to your friends?

    Perhaps a case could be made that AI training on copyrighted materials is not the same as humans consuming the copyrighted material and therefore it should have a different provision in copyright law. I'm no lawyer, but I'd assume that current copyright law works on the basis that humans do not generally have perfect recall of the copyrighted material they consume. But then again a counter argument could be that neither does the AI due to its tendency to hallucinate sometimes. However, it still has superior recall compared to humans and perhaps could be the grounds for amending copyright law about AI training?

    Your last paragraph would be ideal solution in ideal world but I don't think ever like this could happen in the current political and economical structures.

    First its super easy to hide all of this and enforcement would be very difficult even domestically. Second, because we're in AI race no one would ever put themselves in such disadvantage unless its real damage not economical copyright juggling.

    People need to come to terms with these facts so we can address real problems rather than blow against the wind with all this whining we see on Lemmy. There are actual things we can do.

  • Terrible judgement.

    Turn the K value down on the model and it reproduces text near verbatim.

    Ah the Schrödinger's LLM - always hallucinating and also always accurate

  • Alsup? Is this the same judge who also presided over Oracle v. Google over the use of Java in Android? That guy really does his homework over cases he presides on, he learned how to code to see if APIs are copyrightable.

    As for the ruling, I'm not in favour of AI training on copyrighted material, but I can see where the judgement is coming from. I think it's a matter of what's really copyrightable: the actual text or images or the abstract knowledge in the material. In other words, if you were to read a book and then write a summary of a section of it in your own words or orally described what you learned from the book to someone else, does that mean copyright infringement? Or if you watch a movie and then describe your favourite scenes to your friends?

    Perhaps a case could be made that AI training on copyrighted materials is not the same as humans consuming the copyrighted material and therefore it should have a different provision in copyright law. I'm no lawyer, but I'd assume that current copyright law works on the basis that humans do not generally have perfect recall of the copyrighted material they consume. But then again a counter argument could be that neither does the AI due to its tendency to hallucinate sometimes. However, it still has superior recall compared to humans and perhaps could be the grounds for amending copyright law about AI training?

    Acree 100%

    Hope we can refactor this whole copyright/patent concept soon..

    It is more a pain for artists, creators, releasers etc.

    I see it with EDM, I work as a Label, and do sometimes produce a bit

    Most artists will work with samples and presets etc. And keeping track of who worked on what and who owns how much percent of what etc. just takes the joy out of creating..

    Same for game design: You have a vision for your game, make a poc, and then have to change the whole game because of stupid patent shit not allowing you e.g. not land on a horse and immediately ride it, or throwing stuff at things to catch them…

  • Acree 100%

    Hope we can refactor this whole copyright/patent concept soon..

    It is more a pain for artists, creators, releasers etc.

    I see it with EDM, I work as a Label, and do sometimes produce a bit

    Most artists will work with samples and presets etc. And keeping track of who worked on what and who owns how much percent of what etc. just takes the joy out of creating..

    Same for game design: You have a vision for your game, make a poc, and then have to change the whole game because of stupid patent shit not allowing you e.g. not land on a horse and immediately ride it, or throwing stuff at things to catch them…

    I'm inclined to agree. I hate AI, and I especially hate artists and other creatives being shafted, but I'm increasingly doubtful that copyright is an effective way to ensure that they get their fair share (whether we're talking about AI or otherwise).

  • Bad judgement.

    Any reason to say that other than that it didn't give the result you wanted?

  • I'm inclined to agree. I hate AI, and I especially hate artists and other creatives being shafted, but I'm increasingly doubtful that copyright is an effective way to ensure that they get their fair share (whether we're talking about AI or otherwise).

    In an ideal world, there would be something like a universal basic income, which would reduce the pressure on artists that they have to generate enough income with their art, this would allow artists to make art less for mainstream but more unique and thus would, in my opinion, allow to weaken copyright laws

    Well, that would be the way I would try to start change.

  • Ah the Schrödinger's LLM - always hallucinating and also always accurate

    "hallucination refers to the generation of plausible-sounding but factually incorrect or nonsensical information"

    Is an output an hallucination when the training data involved in that output included factually incorrect data? Suppose my input is "is the would flat" and then an LLM, allegedly, accurately generates a flat-eather's writings saying it is.

  • Your last paragraph would be ideal solution in ideal world but I don't think ever like this could happen in the current political and economical structures.

    First its super easy to hide all of this and enforcement would be very difficult even domestically. Second, because we're in AI race no one would ever put themselves in such disadvantage unless its real damage not economical copyright juggling.

    People need to come to terms with these facts so we can address real problems rather than blow against the wind with all this whining we see on Lemmy. There are actual things we can do.

    One way I could see this being enforced is by mandating that AI models not respond to questions that could result in speaking about a copyrighted work. Similar to how mainstream models don't speak about vulgar or controversial topics.

    But yeah, realistically, it's unlikely that any judge would rule in that favour.

  • This post did not contain any content.

    It sounds like the precedent has been set

  • This post did not contain any content.

    Grab em by the intellectual property! When you're a multi-billion dollar corporation, they just let you do it!

  • This post did not contain any content.

    I’ll leave this here from another post on this topic…

  • Ah the Schrödinger's LLM - always hallucinating and also always accurate

    Accuracy and hallucination are two ends of a spectrum.

    If you turn hallucinations to a minimum, the LLM will faithfully reproduce what's in the training set, but the result will not fit the query very well.

    The other option is to turn the so-called temperature up, which will result in replies fitting better to the query but also the hallucinations go up.

    In the end it's a balance between getting responses that are closer to the dataset (factual) or closer to the query (creative).

  • Ah the Schrödinger's LLM - always hallucinating and also always accurate

    There is nothing intelligent about "AI" as we call it. It parrots based on probability. If you remove the randomness value from the model, it parrots the same thing every time based on it's weights, and if the weights were trained on Harry Potter, it will consistently give you giant chunks of harry potter verbatim when prompted.

    Most of the LLM services attempt to avoid this by adding arbitrary randomness values to churn the soup. But this is also inherently part of the cause of hallucinations, as the model cannot preserve a single correct response as always the right way to respond to a certain query.

    LLMs are insanely "dumb", they're just lightspeed parrots. The fact that Meta and these other giant tech companies claim it's not theft because they sprinkle in some randomness is just obscuring the reality and the fact that their models are derivative of the work of organizations like the BBC and Wikipedia, while also dependent on the works of tens of thousands of authors to develop their corpus of language.

    In short, there was a ethical way to train these models. But that would have been slower. And the court just basically gave them a pass on theft. Facebook would have been entirely in the clear had it not stored the books in a dataset, which in itself is insane.

    I wish I knew when I was younger that stealing is wrong, unless you steal at scale. Then it's just clever business.

  • There is nothing intelligent about "AI" as we call it. It parrots based on probability. If you remove the randomness value from the model, it parrots the same thing every time based on it's weights, and if the weights were trained on Harry Potter, it will consistently give you giant chunks of harry potter verbatim when prompted.

    Most of the LLM services attempt to avoid this by adding arbitrary randomness values to churn the soup. But this is also inherently part of the cause of hallucinations, as the model cannot preserve a single correct response as always the right way to respond to a certain query.

    LLMs are insanely "dumb", they're just lightspeed parrots. The fact that Meta and these other giant tech companies claim it's not theft because they sprinkle in some randomness is just obscuring the reality and the fact that their models are derivative of the work of organizations like the BBC and Wikipedia, while also dependent on the works of tens of thousands of authors to develop their corpus of language.

    In short, there was a ethical way to train these models. But that would have been slower. And the court just basically gave them a pass on theft. Facebook would have been entirely in the clear had it not stored the books in a dataset, which in itself is insane.

    I wish I knew when I was younger that stealing is wrong, unless you steal at scale. Then it's just clever business.

    Except that breaking copyright is not stealing and never was. Hard to believe that you'd ever see Copyright advocates on foss and decentralized networks like Lemmy - its like people had their minds hijacked because "big tech is bad".

  • Except that breaking copyright is not stealing and never was. Hard to believe that you'd ever see Copyright advocates on foss and decentralized networks like Lemmy - its like people had their minds hijacked because "big tech is bad".

    What name do you have for the activity of making money using someone else work or data, without their consent or giving compensation? If the tech was just tech, it wouldn't need any non consenting human input for it to work properly. This are just companies feeding on various types of data, if justice doesn't protects an author, what do you think it would happen if these same models started feeding of user data instead? Tech is good, ethics are not

  • 0 Stimmen
    1 Beiträge
    10 Aufrufe
    Niemand hat geantwortet
  • Daily Kos is moving to WordPress

    Technology technology
    2
    1
    5 Stimmen
    2 Beiträge
    25 Aufrufe
    skribe@aussie.zoneS
    Yeah, but why WordPress? The site is blocked in Singapore btw, so I can't RTFA.
  • We need to stop pretending AI is intelligent

    Technology technology
    331
    1
    1k Stimmen
    331 Beiträge
    2k Aufrufe
    dsilverz@friendica.worldD
    @technocrit While I agree with the main point that "AI/LLMs has/have no agency", I must be the boring, ackchyually person who points out and remembers some nerdy things.tl;dr: indeed, AIs and LLMs aren't intelligent... we aren't so intelligent as we think we are, either, because we hold no "exclusivity" of intelligence among biosphere (corvids, dolphins, etc) and because there's no such thing as non-deterministic "intelligence". We're just biologically compelled to think that we can think and we're the only ones to think, and this is just anthropocentric and naive from us (yeah, me included).If you have the patience to read a long and quite verbose text, it's below. If you don't, well, no problems, just stick to my tl;dr above.-----First and foremost, everything is ruled by physics. Deep down, everything is just energy and matter (the former of which, to quote the famous Einstein equation e = mc, is energy as well), and this inexorably includes living beings.Bodies, flesh, brains, nerves and other biological parts, they're not so different from a computer case, CPUs/NPUs/TPUs, cables and other computer parts: to quote Sagan, it's all "made of star stuff", it's all a bunch of quarks and other elementary particles clumped together and forming subatomic particles forming atoms forming molecules forming everything we know, including our very selves...Everything is compelled to follow the same laws of physics, everything is subjected to the same cosmic principles, everything is subjected to the same fundamental forces, everything is subjected to the same entropy, everything decays and ends (and this comment is just a reminder, a cosmic-wide Memento mori).It's bleak, but this is the cosmic reality: cosmos is simply indifferent to all existence, and we're essentially no different than our fancy "tools", be it the wheel, the hammer, the steam engine, the Voyager twins or the modern dystopian electronic devices crafted to follow pieces of logical instructions, some of which were labelled by developers as "Markov Chains" and "Artificial Neural Networks".Then, there's also the human non-exclusivity among the biosphere: corvids (especially Corvus moneduloides, the New Caleidonian crow) are scientifically known for their intelligence, so are dolphins, chimpanzees and many other eukaryotas. Humans love to think we're exclusive in that regard, but we're not, we're just fooling ourselves!IMHO, every time we try to argue "there's no intelligence beyond humans", it's highly anthropocentric and quite biased/bigoted against the countless other species that currently exist on Earth (and possibly beyond this Pale Blue Dot as well). We humans often forgot how we are species ourselves (taxonomically classified as "Homo sapiens"). We tend to carry on our biological existences as if we were some kind of "deities" or "extraterrestrials" among a "primitive, wild life".Furthermore, I can point out the myriad of philosophical points, such as the philosophical point raised by the mere mention of "senses" ("Because it’s bodiless. It has no senses, ..." "my senses deceive me" is the starting point for Cartesian (René Descartes) doubt. While Descarte's conclusion, "Cogito ergo sum", is highly anthropocentric, it's often ignored or forgotten by those who hold anthropocentric views on intelligence, as people often ground the seemingly "exclusive" nature of human intelligence on the ability to "feel".Many other philosophical musings deserve to be mentioned as well: lack of free will (stemming from the very fact that we were unable to choose our own births), the nature of "evil" (both the Hobbesian line regarding "human evilness" and the Epicurean paradox regarding "metaphysical evilness"), the social compliance (I must point out to documentaries from Derren Brown on this subject), the inevitability of Death, among other deep topics.All deep principles and ideas converging, IMHO, into the same bleak reality, one where we (supposedly "soul-bearing beings") are no different from a "souless" machine, because we're both part of an emergent phenomena (Ordo ab chao, the (apparent) order out of chaos) that has been taking place for Æons (billions of years and beyond, since the dawn of time itself).Yeah, I know how unpopular this worldview can be and how downvoted this comment will probably get. Still I don't care: someone who gazed into the abyss must remember how the abyss always gazes us, even those of us who didn't dare to gaze into the abyss yet.I'm someone compelled by my very neurodivergent nature to remember how we humans are just another fleeting arrangement of interconnected subsystems known as "biological organism", one of which "managed" to throw stuff beyond the atmosphere (spacecrafts) while still unable to understand ourselves. We're biologically programmed, just like the other living beings, to "fear Death", even though our very cells are programmed to terminate on a regular basis (apoptosis) and we're are subjected to the inexorable chronological falling towards "cosmic chaos" (entropy, as defined, "as time passes, the degree of disorder increases irreversibly").
  • 834 Stimmen
    83 Beiträge
    87 Aufrufe
    sommerset@thelemmy.clubS
    Which big companies lose money? Frontier or other companies? People switch where? To frontier or away from frontier? Who has faster internet? Frontier or frontier competitors? What does it matter that there are leftists and centrists in the state? How does this have anything to do with the comment u writing about?
  • Firefox is dead to me – and I'm not the only one who is fed up

    Technology technology
    55
    1
    44 Stimmen
    55 Beiträge
    254 Aufrufe
    F
    Never had issue with Firefox in my day to day use, sites load fine, uBlock stops all the annoyances and thankfully youtube works well for me.
  • Matrix.org is Introducing Premium Accounts

    Technology technology
    110
    1
    225 Stimmen
    110 Beiträge
    512 Aufrufe
    F
    It's nice that this exists, but even for this I'd prefer to use an open source tool. And it of course helps with migration only if the old HS is still online.. I think most practically this migration function would be built inside some Matrix client (one that would support more than one server to start with), but I suppose a standalone tool would be a decent solution as well.
  • 1k Stimmen
    95 Beiträge
    94 Aufrufe
    G
    Obviously the law must be simple enough to follow so that for Jim’s furniture shop is not a problem nor a too high cost to respect it, but it must be clear that if you break it you can cease to exist as company. I think this may be the root of our disagreement, I do not believe that there is any law making body today that is capable of an elegantly simple law. I could be too naive, but I think it is possible. We also definitely have a difference on opinion when it comes to the severity of the infraction, in my mind, while privacy is important, it should not have the same level of punishments associated with it when compared to something on the level of poisoning water ways; I think that a privacy law should hurt but be able to be learned from while in the poison case it should result in the bankruptcy of a company. The severity is directly proportional to the number of people affected. If you violate the privacy of 200 million people is the same that you poison the water of 10 people. And while with the poisoning scenario it could be better to jail the responsible people (for a very, very long time) and let the company survive to clean the water, once your privacy is violated there is no way back, a company could not fix it. The issue we find ourselves with today is that the aggregate of all privacy breaches makes it harmful to the people, but with a sizeable enough fine, I find it hard to believe that there would be major or lasting damage. So how much money your privacy it's worth ? 6 For this reason I don’t think it is wise to write laws that will bankrupt a company off of one infraction which was not directly or indirectly harmful to the physical well being of the people: and I am using indirectly a little bit more strict than I would like to since as I said before, the aggregate of all the information is harmful. The point is that the goal is not to bankrupt companies but to have them behave right. The penalty associated to every law IS the tool that make you respect the law. And it must be so high that you don't want to break the law. I would have to look into the laws in question, but on a surface level I think that any company should be subjected to the same baseline privacy laws, so if there isn’t anything screwy within the law that apple, Google, and Facebook are ignoring, I think it should apply to them. Trust me on this one, direct experience payment processors have a lot more rules to follow to be able to work. I do not want jail time for the CEO by default but he need to know that he will pay personally if the company break the law, it is the only way to make him run the company being sure that it follow the laws. For some reason I don’t have my usual cynicism when it comes to this issue. I think that the magnitude of loses that vested interests have in these companies would make it so that companies would police themselves for fear of losing profits. That being said I wouldn’t be opposed to some form of personal accountability on corporate leadership, but I fear that they will just end up finding a way to create a scapegoat everytime. It is not cynicism. I simply think that a huge fine to a single person (the CEO for example) is useless since it too easy to avoid and if it really huge realistically it would be never paid anyway so nothing usefull since the net worth of this kind of people is only on the paper. So if you slap a 100 billion file to Musk he will never pay because he has not the money to pay even if technically he is worth way more than that. Jail time instead is something that even Musk can experience. In general I like laws that are as objective as possible, I think that a privacy law should be written so that it is very objectively overbearing, but that has a smaller fine associated with it. This way the law is very clear on right and wrong, while also giving the businesses time and incentive to change their practices without having to sink large amount of expenses into lawyers to review every minute detail, which is the logical conclusion of the one infraction bankrupt system that you seem to be supporting. Then you write a law that explicitally state what you can do and what is not allowed is forbidden by default.
  • lemm.ee is shutting down at the end of this month

    Technology technology
    130
    625 Stimmen
    130 Beiträge
    545 Aufrufe
    vopyr@lemmy.worldV
    If I know correctly, it is not possible to export posts, comments, replies.