The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall
-
Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.
So, I assume Perplexity uses appropriate identifiable user-agent headers, to allow hosters to decide whether to serve them one way or another?
Its not up to the hoster to decide whom to serve content. Web is intended to be user agent agnostic.
-
It's insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.
Cloudflare is the biggest cancer on the web, fucking burn it.
omg ur a hacker
Did you mean Edge on Windows? 'Cause if so, welcome in!
-
It's insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.
Cloudflare is the biggest cancer on the web, fucking burn it.
I'm on Linux with Firefox and have never had that issue before (particularly nexusmods which I use regularly). Something else is probably wrong with your setup.
-
Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.
So, I assume Perplexity uses appropriate identifiable user-agent headers, to allow hosters to decide whether to serve them one way or another?
And I'm assuming if the robots.txt state their UserAgent isn't allowed to crawl, it obeys it, right?
-
I'm out of the loop, what's wrong with cloud flare?
Centralization, mostly, but also their hands-off approach to most fascist content.
-
DoS attacks are already a crime, so of course the need for some kind of solution is clear. But any proposal that gatekeeps the internet and restricts the freedoms with which the user can interact with it is no solution at all. To me, the openness of the web shouldn't be something that people just consider, or are amenable to. It should be the foundation in which all reasonable proposals should consider as a principle truth.
How "open" a website is, is up to the owner, and that's all. Unless we're talking about de-privatizing the internet as a whole, here.
-
I'm on Linux with Firefox and have never had that issue before (particularly nexusmods which I use regularly). Something else is probably wrong with your setup.
In my case, it's usually the VPN.
-
they cant get their ai to check a box that says "I am not a robot"? I'd think thatd be a first year comp sci student level task. And robots.txt files were basically always voluntary compliance anyway.
Recaptcha v2 does way more than check if the box was checked.
How does Google reCAPTCHA v2 work behind the scenes?
This post refers to Google ReCaptcha v2 (not the latest version) Recently Google introduced a simplified "captcha" verification system (video) that enables users to pass the "captcha" just by clic...
Stack Overflow (stackoverflow.com)
-
Is there some simply deployable PHP honeytrap for AI crawlers?
You could probably route all requests to your site from them, back at themselves, so they DDoS themselves, and on top off it, cost them more because their endpoint needs to process things via their LLM.
-
First we complain that AI steals and trains on our data. Then we complain when it doesn't train. Cool.
I think it boils down to "consent" and "remuneration".
I run a website, that I do not consent to being accessed for LLMs. However, should LLMs use my content, I should be compensated for such use.
So, these LLM startups ignore both consent, and the idea of remuneration.
Most of these concepts have already been figured out for the purpose of law, if we consider websites much akin to real estate: Then, the typical trespass laws, compensatory usage, and hell, even eminent domain if needed ie, a city government can "take over" the boosted post feature to make sure alerts get pushed as widely and quickly as possible.
-
The amount of people just reacting to the headline in the comments on these kinds of articles is always surprising.
Your browser acts as an agent too, you don’t manually visit every script link, image source and CSS file. Everyone has experienced how annoying it is to have your browser be targeted by Cloudflare.
There’s a pretty major difference between a human user loading a page and having it summarized and a bot that is scraping 1500 pages/second.
Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation. But a user initiated operation isn’t the same as a bot.
Which is the point of the article and the article’s title.
It isn’t clear why OP had to alter the headline to bait the anti-ai crowd.
Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted
Except, they don't. It's a toggle, available to users, and by default, allows Perplexity's scraping.
-
There’s no difference in server load between a user looking at a page and a user using an AI tool to summarize the page.
The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It’s the same fucking thing.
You either didn’t read the article or are deliberately making bad faith arguments. The entire point of the article is that the traffic that they’re referring to is initiated by a user, just like when you type an address into your browser’s address bar.
This traffic, initiated by a user, creates the same server load as that same user loading the page in a browser.
Yes, mass scraping of web pages creates a bunch of server load. This was the case before AI was even a thing.
This situation is like Cloudflare presenting was a captcha in order to load each individual image, css or JavaScript asset into a web browser because bot traffic pretends to be a browser.
I don’t think it’s too hard to understand that a bot pretending to be a browser and a human operated browser are two completely different things and classifying them as the same (and captchaing them) would be a classification error.
This is exactly the same kind of error. Even if you personally believe that users using AI tools should be blocked, not everyone has the same opinion. If Cloudflare can’t distinguish between bot requests and human requests then their customers can’t opt out and allow their users to use AI tools even if they want to.
There’s no difference in server load between a user looking at a page and a user using an AI tool to summarize the page.
There is, in scale.
-
It's insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.
Cloudflare is the biggest cancer on the web, fucking burn it.
Linux and Firefox here. No problem at all with Cloudflare, despite having more or less as much privacy preserving add-on as possible. I even spoof my user agent to the latest Firefox ESR on Linux.
Something's may be wrong with your setup.
-
Uh, are they admitting they are trying to circumvent technological protections setup to restrict access to a system?
Isn’t that a literal computer crime?
puts on evil hat CloudFlare should DRM their protection then DMCA Perplexity and other US based "AI" companies to oblivion. Side effect, might break the Internet.
-
I'm on Linux with Firefox and have never had that issue before (particularly nexusmods which I use regularly). Something else is probably wrong with your setup.
"Wrong with my setup" - thats not how internet works.
I'm based in south east asia and often work on the road so IP rating probably is the final crutch in my fingerprint score.
Either way this should be no way acceptible.
-
I hate to break it to you but not only does Cloudflare do this sort of thing, but so does Akamai, AWS, and virtually every other CDN provider out there. And far from being awful, it’s actually protecting the web.
We use Akamai where I work, and they inform us in real time when a request comes from a bot, and they further classify it as one of a dozen or so bots (search engine crawlers, analytics bots, advertising bots, social networks, AI bots, etc). It also informs us if it’s somebody impersonating a well known bot like Google, etc. So we can easily allow search engines to crawl our site while blocking AI bots, bots impersonating Google, and so on.
What I meant with "things like this are awful for the web," I meant that automation through AI is awful for the web. It takes away from the original content creators without any attribution and hits their bottom line.
My story was supposed to be one about responsible AI, but somehow I screwed that up in my summary.
-
Linux and Firefox here. No problem at all with Cloudflare, despite having more or less as much privacy preserving add-on as possible. I even spoof my user agent to the latest Firefox ESR on Linux.
Something's may be wrong with your setup.
Thats not how it works. Cf uses thousands of variables to estimate a trust score and block people so just because it works for you doesn't mean it works.
-
puts on evil hat CloudFlare should DRM their protection then DMCA Perplexity and other US based "AI" companies to oblivion. Side effect, might break the Internet.
Worth it.
-
This post did not contain any content.
I hope this isn't too harsh: but I hope their bots fail and the company loses funding from all investors because it's such a big failure
-
I'm on Linux with Firefox and have never had that issue before (particularly nexusmods which I use regularly). Something else is probably wrong with your setup.
Thirded. All three (Linux, FF, nexus)
ZERO ISSUES.
-
The LLMentalist Effect: how chat-based Large Language Models replicate the mechanisms of a psychic’s con
Technology1
-
-
-
-
Dentist accused of fatally poisoning his wife asked daughter to create AI deepfake of mom asking for chemicals
Technology1
-
-
Against AI: An Open Letter From Writers to Penguin Random House, HarperCollins, Simon & Schuster, Hachette Book Group, Macmillan, and all other publishers of America
Technology1
-