Skip to content

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Technology
254 123 6.4k
  • It sounds like transferring an owned print book to digital and using it to train AI was deemed permissable. But downloading a book from the Internet and using it was training data is not allowed, even if you later purchase the pirated book. So, no one will be knocking down your door for scanning your books.

    This does raise an interesting case where libraries could end up training and distributing public domain AI models.

    I would actually be okay with libraries having those AI services. Even if they were available only for a fee it would be absurdly low and still waived for people with low or no income.

  • You think that 150,000 dollars, or roughly 180 weeks of full time pretax wages at 15$ an hour, is a reasonable fine for making a copy of one book which doe no material harm to the copyright holder?

    No I don’t, but we’re not talking about a single copy of one book, and it is grovellingly insidious to imply that we are.

    We are talking about a company taking the work of an author, of thousands of authors, and using it as the backbone of a machine that’s goal is to make those authors obsolete.

    When the people who own the slop-machine are making millions of dollars off the back of stolen works, they can very much afford to pay those authors. If you can’t afford to run your business without STEALING, then your business is a pile of flaming shit that deserves to fail.

  • None of the above. Every professional in the world, including me, owes our careers to looking at examples of other people's work and incorporating their work into our own work without paying a penny for it. Freely copying and imitating what we see around us has been a human norm for thousands of years - in a process known as "the spread of civilization". Relatively recently it was demonized - for purely business reasons, not moral ones - by people who got rich selling copies of other people's work and paying them a pittance known as a "royalty". That little piece of bait on the hook has convinced a lot of people to put a black hat on behavior that had been considered normal forever. If angry modern enlightened justice warriors want to treat a business concept like a moral principle and get all sweaty about it, that's fine with me, but I'm more of a traditionalist in that area.

    Nobody who is mad at this situation thinks that taking inspiration, riffing on, or referencing other people’s work is the problem when a human being does it. When a person writes, there is intention behind it.

    The issue is when a business, owned by those people you think ‘demonised’ inspiration, take the works of authors and mulch them into something they lovingly named “The Pile”, in order to create derivative slop off the backs of creatives.

    When you, as a “professional”, ask AI to write you a novel, who is being inspired? Who is making the connections between themes? Who is carefully crafting the text to pay loving reference to another authors work? Not you. Not the algorithm that is guessing what word to shit out next based on math.

    These businesses have tricked you into thinking that what they are doing is noble.

  • One point I would refute here is determinism. AI models are, by default, deterministic. They are made from deterministic parts and "any combination of deterministic components will result in a deterministic system". Randomness has to be externally injected into e.g. current LLMs to produce 'non-deterministic' output.

    There is the notable exception of newer models like ChatGPT4 which seemingly produces non-deterministic outputs (i.e. give it the same sentence and it produces different outputs even with its temperature set to 0) - but my understanding is this is due to floating point number inaccuracies which lead to different token selection and thus a function of our current processor architectures and not inherent in the model itself.

    You're correct that a collection of deterministic elements will produce a deterministic result.

    LLMs produce a probability distribution of next tokens and then randomly select one of them. That's where the non-determinism enters the system. Even if you set the temperature to 0 you're going to get some randomness. The GPU can round two different real numbers to the same floating point representation. When that happens, it's a hardware-level coin toss on which token gets selected.

    You can test this empirically. Set the temperature to 0 and ask it, "give me a random number". You'll rarely get the same number twice in a row, no matter how similar you try to make the starting conditions.

  • I've hand calculated forward propagation (neural networks). AI does not learn, its statically optimized. AI "learning" is curve fitting. Human learning requires understanding, which AI is not capable of.

    Human learning requires understanding, which AI is not capable of.

    How could anyone know this?

    Is there some test of understanding that humans can pass and AIs can't? And if there are humans who can't pass it, do we consider then unintelligent?

    We don't even need to set the bar that high. Is there some definition of "understanding" that humans meet and AIs don't?

  • If this is the ruling which causes you to lose trust that any legal system (not just the US') aligns with morality, then I have to question where you've been all this time.

    I could have been more clear, but it wasn't my intention to imply that this particular case is the turning point.

  • No I don’t, but we’re not talking about a single copy of one book, and it is grovellingly insidious to imply that we are.

    We are talking about a company taking the work of an author, of thousands of authors, and using it as the backbone of a machine that’s goal is to make those authors obsolete.

    When the people who own the slop-machine are making millions of dollars off the back of stolen works, they can very much afford to pay those authors. If you can’t afford to run your business without STEALING, then your business is a pile of flaming shit that deserves to fail.

    Except it isnt, because the judge dismissed that part of the suit, saying that people have complete right to digitise and train on works they have a legitimate copy of. So those damages are for making the unauthorised copy, per book.

    And it is not STEALING as you put it, it is making an unauthorised copy, no one loses anything from a copy being made, if I STEAL your phone you no longer have that phone. I do find it sad how many people have drunk the capitalist IP maximalist stance and have somehow convinced themselves that advocating for Disney and the publishing cartel being allowed to dictate how people use works they have is somehow sticking up for the little guy

  • Nobody who is mad at this situation thinks that taking inspiration, riffing on, or referencing other people’s work is the problem when a human being does it. When a person writes, there is intention behind it.

    The issue is when a business, owned by those people you think ‘demonised’ inspiration, take the works of authors and mulch them into something they lovingly named “The Pile”, in order to create derivative slop off the backs of creatives.

    When you, as a “professional”, ask AI to write you a novel, who is being inspired? Who is making the connections between themes? Who is carefully crafting the text to pay loving reference to another authors work? Not you. Not the algorithm that is guessing what word to shit out next based on math.

    These businesses have tricked you into thinking that what they are doing is noble.

    That's 100% rationalization. Machines have never done anything with "inspiration", and that's never been a problem until now. You probably don't insist that your food be hand-carried to you from a farm, or cooked over a fire you started by rubbing two sticks together. I think the mass reaction against AI is part of a larger pattern where people want to believe they're crusading against evil without putting out the kind of effort it takes to fight any of the genuine evils in the world.

  • Human learning requires understanding, which AI is not capable of.

    How could anyone know this?

    Is there some test of understanding that humans can pass and AIs can't? And if there are humans who can't pass it, do we consider then unintelligent?

    We don't even need to set the bar that high. Is there some definition of "understanding" that humans meet and AIs don't?

    It's literally in the phrase "statically optimized." This is like arguing for your preferred deity. It'll never be proven but we have evidence to make our own conclusions. As it is now, AI doesn't learn or understand the same way humans do.

  • It's literally in the phrase "statically optimized." This is like arguing for your preferred deity. It'll never be proven but we have evidence to make our own conclusions. As it is now, AI doesn't learn or understand the same way humans do.

    So you’re confident that human learning involves “understanding” which is distinct from “statistical optimization”. Is this something you feel in your soul or can you define the difference?

  • So you’re confident that human learning involves “understanding” which is distinct from “statistical optimization”. Is this something you feel in your soul or can you define the difference?

    Yes. You learned not to touch a hot stove either from experience or a warning. That fear was immortalized by your understanding that it would hurt. An AI will tell you not to touch a hot stove (most of the time) because the words "hot" "stove" "pain" etc... pop up in its dataset together millions of times. As things are, they're barely comparable. The only reason people keep arguing is because the output is very convincing. Go and download pytorch and read some stuff, or Google it. I've even asked deepseek for you:

    Can AI learn and understand like people?

    AI can learn and perform many tasks similarly to humans, but its understanding is fundamentally different. Here’s how AI compares to human learning and understanding:

    1. Learning: Similar in Some Ways, Different in Others

    • AI Learns from Data: AI (especially deep learning models) improves by processing vast amounts of data, identifying patterns, and adjusting its internal parameters.
    • Humans Learn More Efficiently: Humans can generalize from few examples, use reasoning, and apply knowledge across different contexts—something AI struggles with unless trained extensively.

    2. Understanding: AI vs. Human Cognition

    • AI "Understands" Statistically: AI recognizes patterns and makes predictions based on probabilities, but it lacks true comprehension, consciousness, or awareness.
    • Humans Understand Semantically: Humans grasp meaning, context, emotions, and abstract concepts in a way AI cannot (yet).

    3. Strengths & Weaknesses

    ✔ AI Excels At:

    • Processing huge datasets quickly.
    • Recognizing patterns (e.g., images, speech).
    • Automating repetitive tasks.

    ❌ AI Falls Short At:

    • Common-sense reasoning (e.g., knowing ice melts when heated without being explicitly told).
    • Emotional intelligence (e.g., empathy, humor).
    • Creativity and abstract thinking (though AI can mimic it).

    4. Current AI (Like ChatGPT) is a "Stochastic Parrot"

    • It generates plausible responses based on training but doesn’t truly "know" what it’s saying.
    • Unlike humans, it doesn’t have beliefs, desires, or self-awareness.

    5. Future Possibilities (AGI)

    • Artificial General Intelligence (AGI)—a hypothetical AI with human-like reasoning—could bridge this gap, but we’re not there yet.

    Conclusion:

    AI can simulate learning and understanding impressively, but it doesn’t experience them like humans do. It’s a powerful tool, not a mind.

    Would you like examples of where AI mimics vs. truly understands?

  • I'll repeat what you said with emphasis:

    AI can “learn” from and “read” a book in the same way a person can and does

    The emphasized part is incorrect. It's not the same, yet your argument seems to be that because (your claim) it is the same, then it's no different from a human reading all of these books.

    Regarding your last point, copyright law doesn't just kick in because you try to pass something off as an original (by, for ex, marketing a book as being from a best selling author). It applies based on similarity whether you mention the original author or not.

    Are you taking that as me saying that they "learn in the same way" as in......by using their eyes to see it and ears to listen to it? You seem to be reading waaaaay too much into a simple sentence. AI "learns" by consuming the content. People learn by consuming the content.

    It applies based on similarity whether you mention the original author or not.

    That's if you're recreating something. Writing fan-fiction isn't a violation of copyright.

  • If you want to go to the extreme: delete first copy.

    You can; as I understand it, the only legal requirement is that you only use one copy at a time.

    ie. I can give my book to a friend after I'm done reading it; I can make a copy of a book and keep them at home and at the office and switch off between reading them; I'm not allowed to make a copy of the book hand one to a friend and then both of us read it at the same time.

    That sounds a lot like library ebook renting. Makes sense to me. Ty

  • Yes. You learned not to touch a hot stove either from experience or a warning. That fear was immortalized by your understanding that it would hurt. An AI will tell you not to touch a hot stove (most of the time) because the words "hot" "stove" "pain" etc... pop up in its dataset together millions of times. As things are, they're barely comparable. The only reason people keep arguing is because the output is very convincing. Go and download pytorch and read some stuff, or Google it. I've even asked deepseek for you:

    Can AI learn and understand like people?

    AI can learn and perform many tasks similarly to humans, but its understanding is fundamentally different. Here’s how AI compares to human learning and understanding:

    1. Learning: Similar in Some Ways, Different in Others

    • AI Learns from Data: AI (especially deep learning models) improves by processing vast amounts of data, identifying patterns, and adjusting its internal parameters.
    • Humans Learn More Efficiently: Humans can generalize from few examples, use reasoning, and apply knowledge across different contexts—something AI struggles with unless trained extensively.

    2. Understanding: AI vs. Human Cognition

    • AI "Understands" Statistically: AI recognizes patterns and makes predictions based on probabilities, but it lacks true comprehension, consciousness, or awareness.
    • Humans Understand Semantically: Humans grasp meaning, context, emotions, and abstract concepts in a way AI cannot (yet).

    3. Strengths & Weaknesses

    ✔ AI Excels At:

    • Processing huge datasets quickly.
    • Recognizing patterns (e.g., images, speech).
    • Automating repetitive tasks.

    ❌ AI Falls Short At:

    • Common-sense reasoning (e.g., knowing ice melts when heated without being explicitly told).
    • Emotional intelligence (e.g., empathy, humor).
    • Creativity and abstract thinking (though AI can mimic it).

    4. Current AI (Like ChatGPT) is a "Stochastic Parrot"

    • It generates plausible responses based on training but doesn’t truly "know" what it’s saying.
    • Unlike humans, it doesn’t have beliefs, desires, or self-awareness.

    5. Future Possibilities (AGI)

    • Artificial General Intelligence (AGI)—a hypothetical AI with human-like reasoning—could bridge this gap, but we’re not there yet.

    Conclusion:

    AI can simulate learning and understanding impressively, but it doesn’t experience them like humans do. It’s a powerful tool, not a mind.

    Would you like examples of where AI mimics vs. truly understands?

    That’s a very emphatic restatement of your initial claim.

    I can’t help but notice that, for all the fancy formatting, that wall of text doesn’t contain a single line which actually defines the difference between “learning” and “statistical optimization”. It just repeats the claim that they are different without supporting that claim in any way.

    Nothing in there, precludes the alternative hypothesis; that human learning is entirely (or almost entirely) an emergent property of “statistical optimization”. Without some definition of what the difference would be we can’t even theorize a test


241/254

26. Juni 2025, 14:50


  • Substack’s extremist ecosystem is flourishing

    Technology technology vor 6 Tagen
    1
    32 Stimmen
    1 Beiträge
    3 Aufrufe
    Niemand hat geantwortet
  • 145 Stimmen
    12 Beiträge
    136 Aufrufe
    Indeed—semicolons are usually associated wirh LLMs! But that’s not all! Always remember: use your tools! An LLM „uses“ all types of quotation marks.
  • 175 Stimmen
    9 Beiträge
    101 Aufrufe
    I'm sorry but that capitalisation is really off-putting. You're Not Writing A Headline You Know
  • 33 Stimmen
    4 Beiträge
    49 Aufrufe
    The grift goes nuclear. No surprise.
  • 27 Stimmen
    3 Beiträge
    44 Aufrufe
    What would a use case look like? I assume that the latency will make it impractical to train something that's LLM-sized. But even for something small, wouldn't a data center be more efficient?
  • 1 Stimmen
    2 Beiträge
    29 Aufrufe
    If you're a developer, a startup founder, or part of a small team, you've poured countless hours into building your web application. You've perfected the UI, optimized the database, and shipped features your users love. But in the rush to build and deploy, a critical question often gets deferred: is your application secure? For many, the answer is a nervous "I hope so." The reality is that without a proper defense, your application is exposed to a barrage of automated attacks hitting the web every second. Threats like SQL Injection, Cross-Site Scripting (XSS), and Remote Code Execution are not just reserved for large enterprises; they are constant dangers for any application with a public IP address. The Security Barrier: When Cost and Complexity Get in the Way The standard recommendation is to place a Web Application Firewall (WAF) in front of your application. A WAF acts as a protective shield, inspecting incoming traffic and filtering out malicious requests before they can do any damage. It’s a foundational piece of modern web security. So, why doesn't everyone have one? Historically, robust WAFs have been complex and expensive. They required significant budgets, specialized knowledge to configure, and ongoing maintenance, putting them out of reach for students, solo developers, non-profits, and early-stage startups. This has created a dangerous security divide, leaving the most innovative and resource-constrained projects the most vulnerable. But that is changing. Democratizing Security: The Power of a Community WAF Security should be a right, not a privilege. Recognizing this, the landscape is shifting towards more accessible, community-driven tools. The goal is to provide powerful, enterprise-grade protection to everyone, for free. This is the principle behind the HaltDos Community WAF. It's a no-cost, perpetually free Web Application Firewall designed specifically for the community that has been underserved for too long. It’s not a stripped-down trial version; it’s a powerful security tool designed to give you immediate and effective protection against the OWASP Top 10 and other critical web threats. What Can You Actually Do with It? With a community WAF, you can deploy a security layer in minutes that: Blocks Malicious Payloads: Get instant, out-of-the-box protection against common attack patterns like SQLi, XSS, RCE, and more. Stops Bad Bots: Prevent malicious bots from scraping your content, attempting credential stuffing, or spamming your forms. Gives You Visibility: A real-time dashboard shows you exactly who is trying to attack your application and what methods they are using, providing invaluable security intelligence. Allows Customization: You can add your own custom security rules to tailor the protection specifically to your application's logic and technology stack. The best part? It can be deployed virtually anywhere—on-premises, in a private cloud, or with any major cloud provider like AWS, Azure, or Google Cloud. Get Started in Minutes You don't need to be a security guru to use it. The setup is straightforward, and the value is immediate. Protecting the project, you've worked so hard on is no longer a question of budget. Download: Get the free Community WAF from the HaltDos site. Deploy: Follow the simple instructions to set it up with your web server (it’s compatible with Nginx, Apache, and others). Secure: Watch the dashboard as it begins to inspect your traffic and block threats in real-time. Security is a journey, but it must start somewhere. For developers, startups, and anyone running a web application on a tight budget, a community WAF is the perfect first step. It's powerful, it's easy, and it's completely free.
  • 96 Stimmen
    52 Beiträge
    592 Aufrufe
    Priorities man, priorities
  • 5 Stimmen
    1 Beiträge
    15 Aufrufe
    Niemand hat geantwortet