The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall
-
The amount of people just reacting to the headline in the comments on these kinds of articles is always surprising.
Your browser acts as an agent too, you don’t manually visit every script link, image source and CSS file. Everyone has experienced how annoying it is to have your browser be targeted by Cloudflare.
There’s a pretty major difference between a human user loading a page and having it summarized and a bot that is scraping 1500 pages/second.
Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation. But a user initiated operation isn’t the same as a bot.
Which is the point of the article and the article’s title.
It isn’t clear why OP had to alter the headline to bait the anti-ai crowd.
I think part of the issue is that it does act more like a search engine crawler than a traditional user. A lot of sites rely on real human traffic for revenue (serving ads, requests to sign up for Patreon, using affiliate links, etc) that gets bypassed by these bots. Hell in some cases the people running the sites are just looking for interaction. So while there is a spike in traffic, and potentially cost, the people running these sites aren't getting the benefit of that traffic.
Basically these have the same issues as the summaries that Google does in their search results but, potentially, have much larger impact on the host's bandwidth
-
Fuck that. I don't need prosecutors and the courts to rule that accessing publicly available information in a way that the website owner doesn't want is literally a crime. That logic would extend to ad blockers and editing HTML/js in an "inspect element" tag.
They already prosecute people under the unauthorized access provision. They just don’t prosecute rich people under it.
-
gaining unauthorized access to a computer system
And my point is that defining "unauthorized" to include visitors using unauthorized tools/methods to access a publicly visible resource would be a policy disaster.
If I put a banner on my site that says "by visiting my site you agree not to modify the scripts or ads displayed on the site," does that make my visit with an ad blocker "unauthorized" under the CFAA? I think the answer should obviously be "no," and that the way to define "authorization" is whether the website puts up some kind of login/authentication mechanism to block or allow specific users, not to put a simple request to the visiting public to please respect the rules of the site.
To me, a robots.txt is more like a friendly request to unauthenticated visitors than it is a technical implementation of some kind of authentication mechanism.
Scraping isn't hacking. I agree with the Third Circuit and the EFF: If the website owner makes a resource available to visitors without authentication, then accessing those resources isn't a crime, even if the website owner didn't intend for site visitors to use that specific method.
If I put a banner on my site that says "by visiting my site you agree not to modify the scripts or ads displayed on the site," does that make my visit with an ad blocker "unauthorized" under the CFAA?
How would you “authorize” a user to access assets served by your systems based on what they do with them after they've accessed them? That doesn’t logically follow so no, that would not make an ad blocker unauthorized under the CFAA. Especially because you’re not actually taking any steps to deny these people access either.
AI scrapers on the other hand are a type of users that you’re not authorizing to begin with, and if you’re using CloudFlares bot protection you’re putting into place a system to deny them access. To purposefully circumvent that access would be considered unauthorized.
-
The fact is these laws are already on the books, we may as well utilize them to shut down this objectively harmful activity AI scrapers are doing.
Silly plebe! Those laws are there to target the working class, not to be used against corporations. See: Copyright.
-
Ehhhh, you are gaining access to content due to assumption you are going to interact with ads and thus, bring revenue to the person and/or company producing said content. If you block ads, you remove authorisation brought to you by ads.
That doesn’t make any logical sense. You cant tie legal authorization to an unsaid implicit assumption, especially when that is in turn based on what you do with the content you’ve retrieved from a system after you’ve accessed and retrieved it.
When you access a system, are you authorized to do so, or aren’t you? If you are, that authorization can’t be retroactively revoked. If that were the case, you could be arrested for having used a computer at a job, once you’ve quit. Because even though you were authorized to use it and your corporate network while you worked there, now that you’ve quit and are no longer authorized that would apply retroactively back to when you DID work there.
-
They must be A/B testing a new feature then, it’s not on mine
Log into your dashboard, click "AI Audit", and you'll see the toggles.
-
Please instruct me on how I go to the timeline where the legal system always makes decisions based on logic, reasoning, evidence and fairness and not...the opposite...of all those things
You have a lot of trust placed in the courts to actually do the right thing
I’m not saying courts couldn’t pass a new law saying whatever they want. But the laws we have today would not allow for ad blocking to be considered unauthorized access. Not under the CFAA as mentioned.
I said “The logic would not extend to that” not that a legal system could not act illogically.
-
so how would cloudflare tell the difference between the good 'stripped down' queries and the bad? still not hearing how that is supposed to work. if there's no way to tell the difference, the baby will be thrown out with the bathwater, and I can't blame them.
A large portion of this kind of traffic comes from identifiable sources, like Perplexity’s data centers, so Cloudflare could whitelist known safe sources. This seems to be what they’re doing now, a user replied to one of my comments saying their Cloudflare control panel now has the option of allowing AI queries from Perplexity.
Another way is to allow users to apply for session keys providing they obey rate limits and whitelist users with valid session keys. Non compliant accounts could be banned, maybe require identity verification to prevent ban avoidance.
-
This post did not contain any content.
Traveling snake oil salesman complains he can't pick people's locks.
-
This post did not contain any content.
Is there some simply deployable PHP honeytrap for AI crawlers?
-
I can’t get over their CEO that looks like a nine year old. Not sure what it is about him
I think he grew the beard to look older, but then he put on weight, and let his hair get longer. The choice of glasses style isn't helping either. He's not a bad looking guy, he's just made a string of poor choices, I think.
-
This post did not contain any content.
Words cannot describe how much I hate this person
-
Good. I went through my CF panel, and blocked some of those "AI Assistants" that by default were open, including Perplexity's.
CF panel? Your light bulb??
-
This post did not contain any content.
This is a nice CloudFlare ad
-
This post did not contain any content.
It seems like it's some kind of distraction to make people think things aren't as bad as they really are, it just sounds too far-fetched to me.
It's like a bear that has eaten too much and starts whining because a small rabbit is running away from him, even though the bear has already eaten almost all the rabbits and is clearly full.
-
Most people aren’t technical enough to install an ad blocker, believe it or not.
-
When a firm outright admits to bypassing or trying to bypass measures taken to keep them out, you think that would be a slam dunk case of unauthorized access under the CFAA with felony enhancements.
Right? Isn’t this a textbook DMCA violation, too?
-
It seems like it's some kind of distraction to make people think things aren't as bad as they really are, it just sounds too far-fetched to me.
It's like a bear that has eaten too much and starts whining because a small rabbit is running away from him, even though the bear has already eaten almost all the rabbits and is clearly full.
I mean, that's just capitalism.
Just wait till the bear is lobbying the game warden to put ankle weights on every rabbit. Also the bear would like an assault rifle. Stop being so anti-bear.
-
I’m not saying courts couldn’t pass a new law saying whatever they want. But the laws we have today would not allow for ad blocking to be considered unauthorized access. Not under the CFAA as mentioned.
I said “The logic would not extend to that” not that a legal system could not act illogically.
The original comment reply to you was all about how the legal system would act, that's the primary concern. All it would take is a Trump loyalist judge, a Trump leaning appeals court and the right-wing Supreme Court and boom suddenly the CFAA covers a whole lot more than what was "logical"
-
CF panel? Your light bulb??
CF == Cloudflare