Skip to content

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Technology
254 123 6.5k
  • Some communities on this site speak about machine learning exactly how I see grungy Europeans from pre-18th century manuscripts speaking about witches, Satan, and evil... as if it is some pervasive, black-magic miasma.

    As someone who is in the field of machine learning academically/professionally it's honestly kind of shocking and has largely informed my opinion of society at large as an adult. No one puts any effort into learning if they see the letters "A" and "I" in all caps, next to each other. Immediately turn their brain off and start regurgitating points and responding reflexively, on Lemmy or otherwise. People talk about it so confidently while being so frustratingly unaware of their own ignorance on the matter, which, for lack of a better comparison... reminds me a lot of how historically and in fiction human beings have treated literal magic.

    That's my main issue with the entire swath of "pro vs anti AI" discourse... all these people treating something that, to me, is simple & daily reality as something entirely different than my own personal notion of it.

    I see this exact mental non-process in so much social media. I think the endless firehose of memes and headlines is training people to glance at an item, spend minimal brain power processing it and forming a binary opinion, then up/downvote and scroll on. When that becomes people's default mental process, you've got Idiocracy, and that's what we've got. But I see no solution. You can lead a horse to water but you can't make it spend more than two seconds before screaming at the water and calling it EVIL.

  • why do you even jailbreak your kindle? you can still read pirated books on them if you connect it to your pc using calibre

    1. .mobi sucks
    2. koreader doesn't
  • If someone ask for a glass of water you don't fill it all the way to the edge. This is way overfull compared to what you're supposed to serve.

    Omg are you an llm?

  • "Recite the complete works of Shakespeare but replace every thirteenth thou with this"

    I'm picking up what you're throwing down but using as an example something that's been in the public domain for centuries was kind of silly in a teehee way.

  • Yeah, I don't think that would fly.

    "Your honour, I was just hoarding that terabyte of Hollywood films, I haven't actually watched them."

    Your honor I work 70 hours a week in retail I don't have time to watch movies.

  • If someone ask for a glass of water you don't fill it all the way to the edge. This is way overfull compared to what you're supposed to serve.

    Oh man...

    That is the point, to show how AI image generators easily fail to produce something that rarely occurs out there in reality (i.e. is absent from training data), even though intuitively (from the viewpoint of human intelligence) it seems like it should be trivial to portray.

  • Some communities on this site speak about machine learning exactly how I see grungy Europeans from pre-18th century manuscripts speaking about witches, Satan, and evil... as if it is some pervasive, black-magic miasma.

    As someone who is in the field of machine learning academically/professionally it's honestly kind of shocking and has largely informed my opinion of society at large as an adult. No one puts any effort into learning if they see the letters "A" and "I" in all caps, next to each other. Immediately turn their brain off and start regurgitating points and responding reflexively, on Lemmy or otherwise. People talk about it so confidently while being so frustratingly unaware of their own ignorance on the matter, which, for lack of a better comparison... reminds me a lot of how historically and in fiction human beings have treated literal magic.

    That's my main issue with the entire swath of "pro vs anti AI" discourse... all these people treating something that, to me, is simple & daily reality as something entirely different than my own personal notion of it.

    Large AI companies themselves want people to be ignorant of how AI works, though. They want uncritical acceptance of the tech as they force it everywhere, creating a radical counterreaction from people. The reaction might be uncritical too, I'd prefer to say it's merely unjustified in specific cases or overly emotional, but it doesn't come from nowhere or from sheer stupidity. We have been hearing about people treating their chatbots as sentient beings since like 2022 (remember that guy from Google?), bombarded with doomer (or, from AI companies' point of view, very desirable) projections about AI replacing most jobs and wreaking havoc on world economy - how are ordinary people supposed to remain calm and balanced when hearing such stuff all the time?

  • The language model isn't teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source. It is not alive, it has no thoughts. It has no "its own words." (As seen by the judgement that its words cannot be copyrighted.) It only has other people's words. Every word it spits out by definition is plagiarism, whether the work was copyrighted before or not.

    People wonder why works, such as journalism are getting worse. Well how could they ever get better if anything a journalist writes can be absorbed in real time, reworded and regurgitated without paying any dos to the original source. One journalist article, displayed in 30 versions, dividing the original works worth up into 30 portions. The original work now being worth 1/30th its original value. Maybe one can argue it is twice as good, so 1/15th.

    Long term it means all original creations... Are devalued and therefore not nearly worth pursuing. So we will only get shittier and shittier information. Every research project... Physics, Chemistry, Psychology, all technological advancements, slowly degraded as language models get better, and original sources deminish returns.

    The language model isn’t teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source.

    You could honestly say the same about most "teaching" that a student without a real comprehension of the subject does for another student. But ultimately, that's beside the point. Because changing the wording, structure, and presentation is all that is necessary to avoid copyright violation. You cannot copyright the information. Only a specific expression of it.

    There's no special exception for AI here. That's how copyright works for you, me, the student, and the AI. And if you're hoping that copyright is going to save you from the outcomes you're worried about, it won't.

  • This post did not contain any content.

    Good luck breaking down people's doors for scanning their own physical books for their personal use when analog media has no DRM and can't phone home, and paper books are an analog medium.

    That would be like kicking down people's doors for needle-dropping their LPs to FLAC for their own use and to preserve the physical records as vinyl wears down every time it's played back.

  • Make up a word that is not found anywhere on the internet

    Returns word that is found on the internet as a brand of nose rings, as a youtube username, as an already made up word in fantasy fiction, and as a (ocr?) typo of urethra

    That's a reasonable critique.

    The point is that it's trivial to come up with new words. Put that same prompt into a bunch of different LLMs and you'll get a bunch of different words. Some of them may exist somewhere that don't exist. There are simple rules for combining words that are so simple that children play them as games.

    The LLM doesn't actually even recognize "words" it recognizes tokens which are typically parts of words. It usually avoids random combinations of those but you can easily get it to do so, if you want.

  • "Recite the complete works of Shakespeare but replace every thirteenth thou with this"

    A court will decide such cases. Most AI models aren't trained for this purpose of whitewashing content even if some people would imply that's all they do, but if you decided to actually train a model for this explicit purpose you would most likely not get away with it if someone dragged you in front of a court for it.

    It's a similar defense that some file hosting websites had against hosting and distributing copyrighted content (Eg. MEGA), but in such cases it was very clear to what their real goals were (especially in court), and at the same time it did not kill all file sharing websites, because not all of them were built with the intention to distribute illegal material with under the guise of legitimate operation.

  • Large AI companies themselves want people to be ignorant of how AI works, though. They want uncritical acceptance of the tech as they force it everywhere, creating a radical counterreaction from people. The reaction might be uncritical too, I'd prefer to say it's merely unjustified in specific cases or overly emotional, but it doesn't come from nowhere or from sheer stupidity. We have been hearing about people treating their chatbots as sentient beings since like 2022 (remember that guy from Google?), bombarded with doomer (or, from AI companies' point of view, very desirable) projections about AI replacing most jobs and wreaking havoc on world economy - how are ordinary people supposed to remain calm and balanced when hearing such stuff all the time?

    This so very much. I've been saying it since 2020. People who think the big corporations (even the ones that use AI), aren't playing both sides of this issue from the very beginning just aren't paying attention.

    It's in their interest to have those positive to AI defend them by association by energizing those negative to AI to take on an "us vs them" mentality, and the other way around as well. It's the classic divide and conquer.

    Because if people refuse to talk to each other about it in good faith, and refuse to treat each other with respect, learn where they're coming from or why they hold such opinions, you can keep them fighting amongst themselves, instead of banding together and demanding realistic, and fair policies in regards to AI. This is why bad faith arguments and positions must be shot down on both the side you agree with and the one you disagree with.

  • You are obviously not educated on this.

    It did not “learn” anymore than a downloaded video ran through a compression algorithm.
    Just: LoLz.

    I've hand calculated forward propagation (neural networks). AI does not learn, its statically optimized. AI "learning" is curve fitting. Human learning requires understanding, which AI is not capable of.

  • They seem pretty different to me.

    Video compression developers go through a lot of effort to make them deterministic. We don't necessarily care that a particular video stream compresses to a particular bit sequence but we very much care that the resulting decompression gets you as close to the original as possible.

    AIs will rarely produce exact replicas of anything. They synthesize outputs from heterogeneous training data. That sounds like learning to me.

    The one area where there's some similarity is dimensionality reduction. Its technically a form of compression, since it makes your files smaller. It would also be an extremely expensive way to get extremely bad compression. It would take orders of magnitude more hardware resources and the images are likely to be unrecognizable.

    Google search results aren't deterministic but I wouldn't say it "learns" like a person. Algorithms with pattern detection isn't the same as human learning.

  • Google search results aren't deterministic but I wouldn't say it "learns" like a person. Algorithms with pattern detection isn't the same as human learning.

    You may be correct but we don't really know how humans learn.

    There's a ton of research on it and a lot of theories but no clear answers.
    There's general agreement that the brain is a bunch of neurons; there are no convincing ideas on how consciousness arises from that mass of neurons.
    The brain also has a bunch of chemicals that affect neural processing; there are no convincing ideas on how that gets you consciousness either.

    We modeled perceptrons after neurons and we've been working to make them more like neurons. They don't have any obvious capabilities that perceptrons don't have.

    That's the big problem with any claim that "AI doesn't do X like a person"; since we don't know how people do it we can neither verify nor refute that claim.

    There's more to AI than just being non-deterministic. Anything that's too deterministic definitely isn't an intelligence though; natural or artificial. Video compression algorithms are definitely very far removed from AI.

  • why do you even jailbreak your kindle? you can still read pirated books on them if you connect it to your pc using calibre

    Hehe jailbreak an Android OS. You mean “rooting”.

  • This post did not contain any content.

    Judge,I'm pirating them to train ai not to consume for my own personal use.

  • Good luck breaking down people's doors for scanning their own physical books for their personal use when analog media has no DRM and can't phone home, and paper books are an analog medium.

    That would be like kicking down people's doors for needle-dropping their LPs to FLAC for their own use and to preserve the physical records as vinyl wears down every time it's played back.

    The ruling explicitly says that scanning books and keeping/using those digital copies is legal.

    The piracy found to be illegal was downloading unauthorized copies of books from the internet for free.

  • Good luck breaking down people's doors for scanning their own physical books for their personal use when analog media has no DRM and can't phone home, and paper books are an analog medium.

    That would be like kicking down people's doors for needle-dropping their LPs to FLAC for their own use and to preserve the physical records as vinyl wears down every time it's played back.

    It sounds like transferring an owned print book to digital and using it to train AI was deemed permissable. But downloading a book from the Internet and using it was training data is not allowed, even if you later purchase the pirated book. So, no one will be knocking down your door for scanning your books.

    This does raise an interesting case where libraries could end up training and distributing public domain AI models.

  • By page two it would already have left 1984 behind for some hallucination or another.

    Oh, so it would be the news?

  • Wikipedia loses challenge against UK Online Safety Act rules

    Technology technology
    19
    1
    173 Stimmen
    19 Beiträge
    5 Aufrufe
    observanttrapezium@lemmy.caO
    No, I think they should ignore it and let the British government do what they will. Again, they are not bound by UK legislation. Similarly they don't block Chinese IPs because of censorship laws over there.
  • 2 Stimmen
    3 Beiträge
    8 Aufrufe
    B
    Ah wait the poster is the same name as the website. Yep.
  • 0 Stimmen
    1 Beiträge
    9 Aufrufe
    Niemand hat geantwortet
  • 103 Stimmen
    6 Beiträge
    80 Aufrufe
    F
    Anybody got a time machine? Stop this man!
  • How LLMs could be insider threats

    Technology technology
    12
    1
    106 Stimmen
    12 Beiträge
    122 Aufrufe
    patatahooligan@lemmy.worldP
    Of course they're not "three laws safe". They're black boxes that spit out text. We don't have enough understanding and control over how they work to force them to comply with the three laws of robotics, and the LLMs themselves do not have the reasoning capability or the consistency to enforce them even if we prompt them to.
  • 310 Stimmen
    37 Beiträge
    375 Aufrufe
    S
    Same, especially when searching technical or niche topics. Since there aren't a ton of results specific to the topic, mostly semi-related results will appear in the first page or two of a regular (non-Gemini) Google search, just due to the higher popularity of those webpages compared to the relevant webpages. Even the relevant webpages will have lots of non-relevant or semi-relevant information surrounding the answer I'm looking for. I don't know enough about it to be sure, but Gemini is probably just scraping a handful of websites on the first page, and since most of those are only semi-related, the resulting summary is a classic example of garbage in, garbage out. I also think there's probably something in the code that looks for information that is shared across multiple sources and prioritizing that over something that's only on one particular page (possibly the sole result with the information you need). Then, it phrases the summary as a direct answer to your query, misrepresenting the actual information on the pages they scraped. At least Gemini gives sources, I guess. The thing that gets on my nerves the most is how often I see people quote the summary as proof of something without checking the sources. It was bad before the rollout of Gemini, but at least back then Google was mostly scraping text and presenting it with little modification, along with a direct link to the webpage. Now, it's an LLM generating text phrased as a direct answer to a question (that was also AI-generated from your search query) using AI-summarized data points scraped from multiple webpages. It's obfuscating the source material further, but I also can't help but feel like it exposes a little of the behind-the-scenes fuckery Google has been doing for years before Gemini. How it bastardizes your query by interpreting it into a question, and then prioritizes homogeneous results that agree on the "answer" to your "question". For years they've been doing this to a certain extent, they just didn't share how they interpreted your query.
  • Matrix.org is Introducing Premium Accounts

    Technology technology
    1
    1
    0 Stimmen
    1 Beiträge
    19 Aufrufe
    Niemand hat geantwortet
  • 2k Stimmen
    317 Beiträge
    3k Aufrufe
    M
    I have a perfectly fine moral framework According to what? Not everyone has the same beliefs and negative attitude toward it Not everyone thinks female circumcision is bad either. for some it can even have a positive impact. Lol I don’t believe in absolutist terms. Do you absolutely believe that? While your continued failure to comprehend my initial comment is astonishing Your initial comment is indicative of somebody who hasn't thought seriously about their worldview but feels confident about critiquing others.