Skip to content

Judge backs AI firm over use of copyrighted books

Technology
59 34 323
  • If you try to sell "the new adventures of Doctor Strange, Jonathan Strange and Magic Man." existing copyright laws are sufficient and will stop it. Really, training should be regulated by the same laws as reading. If they can get the material through legitimate means it should be fine, but pulling data that is not freely accessible should be theft, as it is already.

    I have a freely accessible document that I have a cc license for that states it is not to be used for commercial use. This is commercial use. Your policy would allow for that document to be used though since it is accessible. This kind of policy discourages me from easily sharing my works as others profit from my efforts and my works are more likely to be attributed to a corporate beast I want nothing to do with then to me.

    I'm all for copyright reform and simpler copyright law, but these companies need to be held to standard copyright rules and not just made up modifications.
    I'm convinced a perfectly decent LLM could be built without violating copyrights.

    I'd also be ok sharing works with a not for profit open source LLM and I think others might as well.

  • I agree that we need open-source and emancipate ourselves. The main issue I see is: The entire approach doesn't work. I'd like to give the internet as an example. It's meant to be very open, connect everyone and enable them to share information freely. It is set up to be a level playing field... Now look what that leads to. Trillion dollar mega-corporations, privacy issues everywhere and big data silos. That's what the approach promotes. I agree with the goal. But in my opinion the approach will turn out to lead to less open source and more control by rich companies. And that's not what we want.

    Plus nobody even opens the walled gardes. Last time I looked, Reddit wanted money for data. Other big platforms aren't open either. And there's kind of a small war going on with the scrapers and crawlers and anti-measures. So it's not as if it's open as of now.

    A lot of our laws are indeed obsolete. I think the best solution would be to force copy left licenses on anything using public created data.

    But I'll take the wild west we have now with no walls then any kind of copyright dystopia. Reddit did successfully sell it's data to Google for 60 million. Right now, you can legally scrape anything you want off reddit, it is an open garden in every sense of the word (even if they dont like it). It's a lot more legal then using pirated books, but Google still bet 60 million that copyright laws would swing broadly in their favor.

    I think it's very foolhardy to even hint at a pro copyright stance right now. There is a very real chance of AI getting monopolized and this is how they will do it.

  • A lot of our laws are indeed obsolete. I think the best solution would be to force copy left licenses on anything using public created data.

    But I'll take the wild west we have now with no walls then any kind of copyright dystopia. Reddit did successfully sell it's data to Google for 60 million. Right now, you can legally scrape anything you want off reddit, it is an open garden in every sense of the word (even if they dont like it). It's a lot more legal then using pirated books, but Google still bet 60 million that copyright laws would swing broadly in their favor.

    I think it's very foolhardy to even hint at a pro copyright stance right now. There is a very real chance of AI getting monopolized and this is how they will do it.

    I agree a copyright dystopia wouldn't be any good. Just mind that wild west or law of the jungle is the "right of the strongest". You're advantaging big companies and disadvantaging smaller players or people with ethics or who are more open/transparent.

    And I don't think legality with web scraping is the biggest issue. Sure I maybe could do it if it were possible. But I'm occasionally doing some weird stuff and most services have countermeasures in place. In reality I just can't scrape Reddit. Lot's of bots and crawlers just don't work any more. I'm getting rate limited left and right from all big platforms. Lots of things require an account these days, and services are quick banning me for "suspicious activity". It's barely possible to download Youtube videos these days. So, no. I can't. While Google can just pay for it and have the data.

    Also Reddit isn't really the benevolent underdog here. They're a big company as well. And they're not selling their data... They're selling their user's data. They're mainly monetizing other people's creations.

  • If you try to sell "the new adventures of Doctor Strange, Jonathan Strange and Magic Man." existing copyright laws are sufficient and will stop it. Really, training should be regulated by the same laws as reading. If they can get the material through legitimate means it should be fine, but pulling data that is not freely accessible should be theft, as it is already.

    as it is already

    Copies of copyrighted works cannot be regarded as "stolen property" for the purposes of a prosecution under the National Stolen Property Act of 1934.

    https://en.m.wikipedia.org/wiki/Dowling_v.United_States(1985)

  • used to train both commercial

    commercial training is, in this case, stealing people's work for commercial gain

    and open source language models

    so, uh, let us train open-source models on open-source text. There's so much of it that there's no need to steal.

    ?

    I'm not sure why you added a question mark at the end of your statement.

    I'm not sure why you added a question mark at the end of your statement.

    I was questioning whether or not you would see that as a benefit. Clearly you don't.

    Are you also against libraries letting people borrow books since those are also lost sales for the authors, or are you just a luddite?

  • I'm not sure why you added a question mark at the end of your statement.

    I was questioning whether or not you would see that as a benefit. Clearly you don't.

    Are you also against libraries letting people borrow books since those are also lost sales for the authors, or are you just a luddite?

    libraries letting people borrow books

    This is so far from analogous that it's almost a nonsequitur.

    are you just a luddite?

    No, and you don't even believe such nonsense. You're grasping, ineffectively.

  • Wait, the authors argued that? Why? That's literally the opposite of the thing they needed to argue.

  • As a civil matter, the publishing houses are more likely to get the full money if anthropic stays in business (and does well). So it might be bad, but I'm really skeptical about bankruptcy (and I'm not hearing anyone seriously floating it?)

    Depending on the type of bankruptcy, the business can still operate, all their profits would just be going towards paying off their depts.

  • C could still bankrupt the company depending on how trial goes. They pirated a lot of books.

    It might be that bad. Most 'damage' (as publishers see it) comes from distribution, not the download itself. Depending on how they acquired the books, it might be not be much of a problem.

  • Plantifs made that argument and the judge shoots it down pretty hard. That competition isn't what copyright protects from. He makes an analogy with teachers teaching children to write fiction: they are using existing fantasy to create MANY more competitors on the fiction market. Could an author use copyright to challenge that use?

    Would love to hear your thoughts on the ruling itself (it's linked by reuters).

    Orcs and dwarves (with a v) are creations of Tolkien, if the fantasy stories include them, it's a violation of copyright the same as including Mickey mouse.

    My argument would have been to ask the ai for the bass line to Queen & David Bowie's Under Pressure. Then refer to that as a reproduction of copyrighted material. But then again, AI companies probably have better lawyers than vanilla ice.

  • An 80 year old judge on their best day couldn't be trusted to make an informed decision. This guy was either bought or confused into his decision. Old people gotta go.

    Funny, there's a lot of people on lemmy itself (especially around dbzer0) who would agree with the judge wholeheartedly.

  • Orcs and dwarves (with a v) are creations of Tolkien, if the fantasy stories include them, it's a violation of copyright the same as including Mickey mouse.

    My argument would have been to ask the ai for the bass line to Queen & David Bowie's Under Pressure. Then refer to that as a reproduction of copyrighted material. But then again, AI companies probably have better lawyers than vanilla ice.

    The students read Tolkien, then invent their own settings. The judge thinks this is similar to how claude works. I, nor I suspect the judge, meant that the students were reusing world building whole cloth.

  • 172 Stimmen
    10 Beiträge
    37 Aufrufe
    roofuskit@lemmy.worldR
    Amazon cuts lots of jobs constantly. They have turnover requirements for pretty much all departments. They are notorious for firing people of they think they could replace with anyone higher performing, including people undergoing cancer treatment. In the industry being fired from Amazon is often seen as a right of passage.
  • Best Andar Bahar game development company

    Technology technology
    1
    2
    0 Stimmen
    1 Beiträge
    10 Aufrufe
    Niemand hat geantwortet
  • ICEBlock - See Something, Tap Something

    Technology technology
    11
    53 Stimmen
    11 Beiträge
    79 Aufrufe
    captainautism@lemmy.dbzer0.comC
    Me neither
  • 337 Stimmen
    19 Beiträge
    112 Aufrufe
    R
    What I'm speaking about is that it should be impossible to do some things. If it's possible, they will be done, and there's nothing you can do about it. To solve the problem of twiddled social media (and moderation used to assert dominance) we need a decentralized system of 90s Web reimagined, and Fediverse doesn't deliver it - if Facebook and Reddit are feudal states, then Fediverse is a confederation of smaller feudal entities. A post, a person, a community, a reaction and a change (by moderator or by the user) should be global entities (with global identifiers, so that the object by id of #0000001a2b3c4d6e7f890 would be the same object today or 10 years later on every server storing it) replicated over a network of servers similarly to Usenet (and to an IRC network, but in an IRC network servers are trusted, so it's not a good example for a global system). Really bad posts (or those by persons with history of posting such) should be banned on server level by everyone. The rest should be moderated by moderator reactions\changes of certain type. Ideally, for pooling of resources and resilience, servers would be separated by types into storage nodes (I think the name says it, FTP servers can do the job, but no need to be limited by it), index nodes (scraping many storage nodes, giving out results in structured format fit for any user representation, say, as a sequence of posts in one community, or like a list of communities found by tag, or ... , and possibly being connected into one DHT for Kademlia-like search, since no single index node will have everything), and (like in torrents?) tracker nodes for these and for identities, I think torrent-like announce-retrieve service is enough - to return a list of storage nodes storing, say, a specified partition (subspace of identifiers of objects, to make looking for something at least possibly efficient), or return a list of index nodes, or return a bunch of certificates and keys for an identity (should be somehow cryptographically connected to the global identifier of a person). So when a storage node comes online, it announces itself to a bunch of such trackers, similarly with index nodes, similarly with a user. One can also have a NOSTR-like service for real-time notifications by users. This way you'd have a global untrusted pooled infrastructure, allowing to replace many platforms. With common data, identities, services. Objects in storage and index services can be, say, in a format including a set of tags and then the body. So a specific application needing to show only data related to it would just search on index services and display only objects with tags of, say, "holo_ns:talk.bullshit.starwars" and "holo_t:post", like a sequence of posts with ability to comment, or maybe it would search objects with tags "holo_name:My 1999-like Star Wars holopage" and "holo_t:page" and display the links like search results in Google, and then clicking on that you'd see something presented like a webpage, except links would lead to global identifiers (or tag expressions interpreted by the particular application, who knows). (An index service may return, say, an array of objects, each with identifier, tags, list of locations on storage nodes where it's found or even bittorrent magnet links, and a free description possibly ; then the user application can unify responses of a few such services to avoid repetitions, maybe sort them, represent them as needed, so on.) The user applications for that common infrastructure can be different at the same time. Some like Facebook, some like ICQ, some like a web browser, some like a newsreader. (Star Wars is not a random reference, my whole habit of imagining tech stuff is from trying to imagine a science fiction world of the future, so yeah, this may seem like passive dreaming and it is.)
  • Anthem Demo - Napster plus Distributed Machine Learning

    Technology technology
    1
    1
    7 Stimmen
    1 Beiträge
    14 Aufrufe
    Niemand hat geantwortet
  • Theoretical Private Age Confirmation -- Possible?

    Technology technology
    1
    0 Stimmen
    1 Beiträge
    11 Aufrufe
    Niemand hat geantwortet
  • Front Brake Lights Could Drastically Diminish Road Accident Rates

    Technology technology
    337
    1
    595 Stimmen
    337 Beiträge
    1k Aufrufe
    M
    I always say there are drivers out there who only survive by the grace of other drivers.
  • 374 Stimmen
    69 Beiträge
    252 Aufrufe
    T
    In those situations I usually enable 1.5x.