Skip to content

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

Technology
233 123 49
  • gaining unauthorized access to a computer system

    And my point is that defining "unauthorized" to include visitors using unauthorized tools/methods to access a publicly visible resource would be a policy disaster.

    If I put a banner on my site that says "by visiting my site you agree not to modify the scripts or ads displayed on the site," does that make my visit with an ad blocker "unauthorized" under the CFAA? I think the answer should obviously be "no," and that the way to define "authorization" is whether the website puts up some kind of login/authentication mechanism to block or allow specific users, not to put a simple request to the visiting public to please respect the rules of the site.

    To me, a robots.txt is more like a friendly request to unauthenticated visitors than it is a technical implementation of some kind of authentication mechanism.

    Scraping isn't hacking. I agree with the Third Circuit and the EFF: If the website owner makes a resource available to visitors without authentication, then accessing those resources isn't a crime, even if the website owner didn't intend for site visitors to use that specific method.

    Site owners currently do and should have the freedom to decide who is and is not allowed to access the data, and to decide for what purpose it gets used for. Idgaf if you think scraping is malicious or not, it is and should be illegal to violate clear and obvious barriers against them at the cost of the owners and unsanctioned profit of the scrapers off of the work of the site owners.

  • yeah it's almost like there as already a system for this in place

    THE CAKE DAY IS NOW. (i dont have an image at hand)

  • THE CAKE DAY IS NOW. (i dont have an image at hand)

    i really wish we wouldn't do those. feels too reddity.

    but thanks.

  • i really wish we wouldn't do those. feels too reddity.

    but thanks.

    as you wish

  • *monkeys paw curls and i turn into cake*

  • The shit they know. Plus their support for non-JS users or For are pure shite

    Yeah, a few sites outright refuse to work because cloudflare just poops.
    EDIT: It was supposed to say "loops", but I'm keeping it.

  • puts on evil hat CloudFlare should DRM their protection then DMCA Perplexity and other US based "AI" companies to oblivion. Side effect, might break the Internet.

    The Internet was already ruined, cloudflare is just bandaids on top of band aids.

  • It's been this from the very beginning. But they don't fit the definition of a protection racket as they're not the ones attacking you if you don't pay up. So they're more like a security company that has no competitors due to the needed investment to operate.

    Cloudflare are notorious for shielding cybercrime sites. You can't even complain about abuse of Cloudflare about them, they'll just forward on your abuse complaint to the likely dodgy host of the cybercrime site. They don't even have a channel to complain to them about network abuse of their DNS services.

    So they certainly are an enabler of the cybercriminals they purport to protect people from.

  • Can't believe I've lived to see Cloudflare be the good guys

    Lesser of two bad guys maybe?

  • Cloudflare are notorious for shielding cybercrime sites. You can't even complain about abuse of Cloudflare about them, they'll just forward on your abuse complaint to the likely dodgy host of the cybercrime site. They don't even have a channel to complain to them about network abuse of their DNS services.

    So they certainly are an enabler of the cybercriminals they purport to protect people from.

    If they acted differently, they'd probably be liable for illegal activity that they proxy for (this is for example relevant for the DMCA safe harbor).

    Anyhow, when on their abuse page, I have an option for "Registrar", which is used for "DNS abuse", among others.

  • Cloudflare are notorious for shielding cybercrime sites. You can't even complain about abuse of Cloudflare about them, they'll just forward on your abuse complaint to the likely dodgy host of the cybercrime site. They don't even have a channel to complain to them about network abuse of their DNS services.

    So they certainly are an enabler of the cybercriminals they purport to protect people from.

    Any internet service provider needs to be completely neutral. Not only in their actions, but also in their liability.
    Same goes for other services like payment processors.
    If companies that provide content-agnostic services are allowed to policy the content, that opens the door to really nasty stuff.

    You can't chop everyone's arms to stop a few people from stealing.

    If they think their services are being used in a reprehensible manner, what they need to do is alert the authorities, not act like vigilantes.

  • That all sounds very vague to me, and I don't expect it to be captured properly by law any time soon. Being accessed for LLM? What does it mean for you and how is it different from being accessed by a user? Imagine you host a weather forecast. If that information is public, what kind of compensation do you expect from anyone or anything who accesses that data?

    Is it okay for a person to access your site? Is it okay for a script written by that person to fetch data every day automatically? Would it be okay for a user to dump a page of your site with a headless browser? Would it be okay to let an LLM take a look at it to extract info required by a user? Have you heard about changedetection.io project? If some of these sound unfair to you, you might want to put a DRM on your data or something.

    Would you expect a compensation from me after reading your comment?

    That all sounds very vague to me, and I don’t expect it to be captured properly by law any time soon.

    It already has been captured, properly in law, in most places. We can use the US as an example: Both intellectual property and real property have laws already that cover these very items.

    What does it mean for you and how is it different from being accessed by a user?

    Well, does a user burn up gigawatts of power, to access my site every time? That's a huge different.

    Imagine you host a weather forecast. If that information is public, what kind of compensation do you expect from anyone or anything who accesses that data?

    Depends on the terms of service I set for that service.

    Is it okay for a person to access your site?

    Sure!

    Is it okay for a script written by that person to fetch data every day automatically?

    Sure! As long as it doesn't cause problems for me, the creator and hoster of said content.

    Would it be okay for a user to dump a page of your site with a headless browser?

    See above. Both power usage and causing problems for me.

    Would it be okay to let an LLM take a look at it to extract info required by a user?

    No. I said, I do not want my content and services to be used by and for LLMs.

    Have you heard about changedetection.io project?

    I have now. And should a user want to use that service, that service, which charges 8.99/month for it needs to pay me a portion of that, or risk having their service blocked.

    There no need to use it, as I already provide RSS feeds for my content. Use the RSS feed, if you want updates.

    If some of these sound unfair to you, you might want to put a DRM on your data or something.

    Or, I can just block them, via a service like Cloud Flare. Which I do.

    Would you expect a compensation from me after reading your comment?

    None. Unless you're wanting to access if via an LLM. Then I want compensation for the profit driven access to my content.

  • This post did not contain any content.

    I don't see a problem here. Maybe Perplexity should consider the reasons WHY Cloudflare have a firewall...?

  • Recaptcha v2 does way more than check if the box was checked.

    you're not wrong, but it also allows more than 99.8% of the bot traffic through too on text challenges. Its like the TSA of website security. Its mostly there to keep the user busy while cloudflare places itself in a man in the middle of your encrypted connection to a third party. The only difference between cloudflare and a malicious attacker is cloudflares stated intention not to be evil. With that and 3 dollars I can buy myself a single hard shell taco from tacobell.

  • Site owners currently do and should have the freedom to decide who is and is not allowed to access the data, and to decide for what purpose it gets used for. Idgaf if you think scraping is malicious or not, it is and should be illegal to violate clear and obvious barriers against them at the cost of the owners and unsanctioned profit of the scrapers off of the work of the site owners.

    to decide for what purpose it gets used for

    Yeah, fuck everything about that. If I'm a site visitor I should be able to do what I want with the data you send me. If I bypass your ads, or use your words to write a newspaper article that you don't like, tough shit. Publishing information is choosing not to control what happens to the information after it leaves your control.

    Don't like it? Make me sign an NDA. And even then, violating an NDA isn't a crime, much less a felony punishable by years of prison time.

    Interpreting the CFAA to cover scraping is absurd and draconian.

  • That all sounds very vague to me, and I don’t expect it to be captured properly by law any time soon.

    It already has been captured, properly in law, in most places. We can use the US as an example: Both intellectual property and real property have laws already that cover these very items.

    What does it mean for you and how is it different from being accessed by a user?

    Well, does a user burn up gigawatts of power, to access my site every time? That's a huge different.

    Imagine you host a weather forecast. If that information is public, what kind of compensation do you expect from anyone or anything who accesses that data?

    Depends on the terms of service I set for that service.

    Is it okay for a person to access your site?

    Sure!

    Is it okay for a script written by that person to fetch data every day automatically?

    Sure! As long as it doesn't cause problems for me, the creator and hoster of said content.

    Would it be okay for a user to dump a page of your site with a headless browser?

    See above. Both power usage and causing problems for me.

    Would it be okay to let an LLM take a look at it to extract info required by a user?

    No. I said, I do not want my content and services to be used by and for LLMs.

    Have you heard about changedetection.io project?

    I have now. And should a user want to use that service, that service, which charges 8.99/month for it needs to pay me a portion of that, or risk having their service blocked.

    There no need to use it, as I already provide RSS feeds for my content. Use the RSS feed, if you want updates.

    If some of these sound unfair to you, you might want to put a DRM on your data or something.

    Or, I can just block them, via a service like Cloud Flare. Which I do.

    Would you expect a compensation from me after reading your comment?

    None. Unless you're wanting to access if via an LLM. Then I want compensation for the profit driven access to my content.

    Both intellectual property and real property have laws already that cover these very items.

    And it causes a lot of trouble to many people and pains me specifically. Information should not be gated or owned in a way that would make it illegal for anyone to access it under proper conditions. License expiration causing digital work to die out, DRM causing software to break, idiotic license owners not providing appropriate service, etc.

    Well, does a user burn up gigawatts of power, to access my site every time?

    Doing a GET request doesn't do that.

    As long as it doesn't cause problems for me, the creator and hoster of said content.

    What kind of problems that would be?

    Both power usage and causing problems for me.

    ?? How? And what?

    do not want my content and services to be used by and for LLMs.

    You have to agree that at one point "be used by LLM" would not be different from "be used by a user".

    which charges 8.99/month

    It's self-hosted and free.

    Use the RSS feed, if you want updates.

    How does that prohibit usage and processing of your info? That sounds like "I won't be providing any comments on Lemmy website, if you want my opinion you can mail me at a@b.com"

    I can just block them, via a service like Cloud Flare. Which I do.

    That will never block all of them. Your info will be used without your consent and you will not feel troubled from it. So you might not feel troubled if more things do the same.

    None. Unless you're wanting to access if via an LLM. Then I want compensation for the profit driven access to my content.

    What if I use my local hosted LLM? Anyway, the point is, selling text can't work well, and you're going to spend much more resources on collecting and summarizing data about how your text was used and how others benefited from it, in order to get compensation, than it worths.

    Also, it might be the case that some information is actually worthless when compared to a service provided by things like LLM, even though they use that worthless information in the process.

    I'm all for killing off LLMs, btw. Concerns of site makers who think they are being damaged by things like Perplexity are nothing compared to what LLMs do to the world. Maybe laws should instead make it illegal to waste energy. Before energy becomes the main currency.

  • to decide for what purpose it gets used for

    Yeah, fuck everything about that. If I'm a site visitor I should be able to do what I want with the data you send me. If I bypass your ads, or use your words to write a newspaper article that you don't like, tough shit. Publishing information is choosing not to control what happens to the information after it leaves your control.

    Don't like it? Make me sign an NDA. And even then, violating an NDA isn't a crime, much less a felony punishable by years of prison time.

    Interpreting the CFAA to cover scraping is absurd and draconian.

    If you want anybody and everyone to be able to use everything you post for any purpose, right on, good for you, but don't try to force your morality on others who rely on their writing, programming, and artworks to make a living to survive.

  • If you want anybody and everyone to be able to use everything you post for any purpose, right on, good for you, but don't try to force your morality on others who rely on their writing, programming, and artworks to make a living to survive.

    I'm gonna continue to use ad blockers and yt-dlp, and if you think I'm a criminal for doing so, I'm gonna say you don't understand either technology or criminal law.

  • I'm gonna continue to use ad blockers and yt-dlp, and if you think I'm a criminal for doing so, I'm gonna say you don't understand either technology or criminal law.

    Thats a crime yeah and if Alphabet co wants to sue you for $1.34 damages then they have that right, just as we should have the right to sue them if their AI crawlers make our site unusable and plagiarize our work to the effect of thousands of dollars, or even press charges for the criminal act of intentional disruption of services.

  • Both intellectual property and real property have laws already that cover these very items.

    And it causes a lot of trouble to many people and pains me specifically. Information should not be gated or owned in a way that would make it illegal for anyone to access it under proper conditions. License expiration causing digital work to die out, DRM causing software to break, idiotic license owners not providing appropriate service, etc.

    Well, does a user burn up gigawatts of power, to access my site every time?

    Doing a GET request doesn't do that.

    As long as it doesn't cause problems for me, the creator and hoster of said content.

    What kind of problems that would be?

    Both power usage and causing problems for me.

    ?? How? And what?

    do not want my content and services to be used by and for LLMs.

    You have to agree that at one point "be used by LLM" would not be different from "be used by a user".

    which charges 8.99/month

    It's self-hosted and free.

    Use the RSS feed, if you want updates.

    How does that prohibit usage and processing of your info? That sounds like "I won't be providing any comments on Lemmy website, if you want my opinion you can mail me at a@b.com"

    I can just block them, via a service like Cloud Flare. Which I do.

    That will never block all of them. Your info will be used without your consent and you will not feel troubled from it. So you might not feel troubled if more things do the same.

    None. Unless you're wanting to access if via an LLM. Then I want compensation for the profit driven access to my content.

    What if I use my local hosted LLM? Anyway, the point is, selling text can't work well, and you're going to spend much more resources on collecting and summarizing data about how your text was used and how others benefited from it, in order to get compensation, than it worths.

    Also, it might be the case that some information is actually worthless when compared to a service provided by things like LLM, even though they use that worthless information in the process.

    I'm all for killing off LLMs, btw. Concerns of site makers who think they are being damaged by things like Perplexity are nothing compared to what LLMs do to the world. Maybe laws should instead make it illegal to waste energy. Before energy becomes the main currency.

    Information should not be gated or owned in a way that would make it illegal for anyone to access it under proper conditions.

    Then you don't believe content creators should have any control over their own works?

    The "proper conditions" are deemed by the content creator, not the consumers.

    Doing a GET request doesn’t do that.

    Not at all. It consumes at most, a watt.

    What kind of problems that would be?

    Increasing my hosting bill, to accommodate the senseless traffic being sent my way?

    Outages for my site, making my content unavailable for legitimate users?

    You have to agree that at one point “be used by LLM” would not be different from “be used by a user”.

    Not at all. LLMs are not users.

    It’s self-hosted and free.

    If you want, or they charge for the hosted version. If they want to use a paid for version, then they can divert some of that revenue to me, the creator, because without creators, they would have no product.

    How does that prohibit usage and processing of your info? That sounds like “I won’t be providing any comments on Lemmy website, if you want my opinion you can mail me at a@b.com

    That's a apples and oranges comparison, and you know it.

    That will never block all of them. Your info will be used without your consent and you will not feel troubled from it. So you might not feel troubled if more things do the same.

    Perplexity seems to be troubled by it.

    What if I use my local hosted LLM? Anyway, the point is, selling text can’t work well, and you’re going to spend much more resources on collecting and summarizing data about how your text was used and how others benefited from it, in order to get compensation, than it worths.

    If selling text can't work well, then why do LLM products insist on using my text, to sell it?

    Also, it might be the case that some information is actually worthless when compared to a service provided by things like LLM, even though they use that worthless information in the process.

    LLMs are a net negative, as far as costs go. They consume far more in resources than they provide in benefit. If my information was worthless without an LLM, it's worthless with an LLM, therefore, LLMs don't need to access it. Periodt.

    The bottom line? Content creators get the first say in how their content is used, and consumed. You are not entitled to their labor, for free, and without condition.

  • TikTok Shop Sells Viral GPS Trackers Marketed to Stalkers

    Technology technology
    54
    1
    246 Stimmen
    54 Beiträge
    13 Aufrufe
    M
    The app broke for a few days with a message kissing Dump's ass, and when it came back, all videos that mentioned fascism had been removed
  • 187 Stimmen
    10 Beiträge
    18 Aufrufe
    sturgist@lemmy.caS
    President Trump ’s War on “Woke AI” Is a Civil Liberties Nightmare
  • 66 Stimmen
    2 Beiträge
    36 Aufrufe
    W
    In April, Nigeria asked Google, Microsoft, and Amazon to set concrete deadlines for opening data centers in the country. Nigeria has been making this demand for about four years, but the companies have so far failed to fulfill their promises. Now, Nigeria has set up a working group with the companies to ensure that data is stored within its shores. Just onshoring the data center does not solve the problems. You can't be sure no data travels to the US servers, some data does need to travel to the US servers, and the entire DC is still subject to US software and certificate keychains. It's better, but not good or safe. I need to channel my inner Mike Ehrmantrout to the US tech companies and government: you had a good thing going you stupid son of a bitch. You had everything you needed and it all ran like clockwork. You could have shut your mouth, cooked, and made as much money as you needed, but you just had to blow it up, you and your pride and your ego. Seriously, this is a massive own goal by the US government. This is a massive loss to US hegemony and influence around the world that's never coming back. It has never been easier to build sovereign clouds with off the shelf and open source tooling. The best practices are largely documented, software is commoditized, and there are plenty of qualified people out there these days and governments staring down the barrel of existential risk have finally got the incentive to fund these efforts.
  • 590 Stimmen
    120 Beiträge
    2k Aufrufe
    chickenandrice@sh.itjust.worksC
    Building a linux phone: do you mean from scratch, or just installing one of the Linux phone OS's that already exist? I've been following Ubuntu Touch for several years now and, while they have made a lot of progress, its main hurdles have the same thing in common: mobile hardware is incredibly locked down. For example, Ubuntu Touch uses proprietary Android drivers for many low level functions. Even then, there's some features that aren't stable across all devices, like VOLTE. It sucks, I really want to use Ubuntu Touch (or any of the Linux alternatives) but I can't make phone calls or text in the US without VOLTE support. There are a few phones that support VOLTE, but the feature is either in beta, the phone is expensive, or the phone is not sold in the US. Anyways bringing that back to Graphene: In my case, I'm using this as a stopgap until Linux phones take off (assuming they ever do). For now I guess the best thing is to just be skeptic, keep things minimal, and bloat-free.
  • Dyson Has Killed Its Bizarre Zone Air-Purifying Headphones

    Technology technology
    45
    1
    226 Stimmen
    45 Beiträge
    617 Aufrufe
    rob_t_firefly@lemmy.worldR
    I have been chuckling like a dork at this particular patent since such things first became searchable online, and have never found any evidence of it being manufactured and marketed at all. The "non-adhesive adherence" is illustrated in the diagrams on the patent which you can see at the link. The inventor proposes "a facing of fluffy fibrous material" to provide the filtration and the adherence; basically this thing is the softer side of a velcro strip, bent in half with the fluff facing outward so it sticks to the inside of your buttcrack to hold itself in place in front of your anus and filter your farts through it.
  • 6 Stimmen
    4 Beiträge
    48 Aufrufe
    T
    Oh I agree. I just think is part of the equation perhaps the thinner and lighter will enable for better processor? Not an AR guy , although I lived my oculus until FB got hold of it. Didn't use it ever again after that day.
  • Is Washington state falling out of love with Tesla?

    Technology technology
    10
    1
    61 Stimmen
    10 Beiträge
    92 Aufrufe
    B
    These Tesla owners who love their cars but hate his involvement with government are a bit ridiculous because one of the biggest reasons he got in loved with shilling for the right is that the government was looking into regulations and investigations concerning how unsafe Tesla cars are.
  • 0 Stimmen
    1 Beiträge
    23 Aufrufe
    Niemand hat geantwortet