The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall
-
The original comment reply to you was all about how the legal system would act, that's the primary concern. All it would take is a Trump loyalist judge, a Trump leaning appeals court and the right-wing Supreme Court and boom suddenly the CFAA covers a whole lot more than what was "logical"
The original comment reply to me was all about how the legal system would act in the context of the CFAA specifically. And in that context that logic does not follow. Theres not much latitude for any judge to interpret the CFAA that way.
They could always push through some new law however.
-
If I put a banner on my site that says "by visiting my site you agree not to modify the scripts or ads displayed on the site," does that make my visit with an ad blocker "unauthorized" under the CFAA?
How would you “authorize” a user to access assets served by your systems based on what they do with them after they've accessed them? That doesn’t logically follow so no, that would not make an ad blocker unauthorized under the CFAA. Especially because you’re not actually taking any steps to deny these people access either.
AI scrapers on the other hand are a type of users that you’re not authorizing to begin with, and if you’re using CloudFlares bot protection you’re putting into place a system to deny them access. To purposefully circumvent that access would be considered unauthorized.
That doesn’t logically follow so no, that would not make an ad blocker unauthorized under the CFAA.
The CFAA also criminalizes "exceeding authorized access" in every place it criminalizes accessing without authorization. My position is that mere permission (in a colloquial sense, not necessarily technical IT permissions) isn't enough to define authorization. Social expectations and even contractual restrictions shouldn't be enough to define "authorization" in this criminal statute.
To purposefully circumvent that access would be considered unauthorized.
Even as a normal non-bot user who sees the cloudflare landing page because they're on a VPN or happen to share an IP address with someone who was abusing the network? No, circumventing those gatekeeping functions is no different than circumventing a paywall on a newspaper website by deleting cookies or something. Or using a VPN or relay to get around rate limiting.
The idea of criminalizing scrapers or scripts would be a policy disaster.
-
I mean, that's just capitalism.
Just wait till the bear is lobbying the game warden to put ankle weights on every rabbit. Also the bear would like an assault rifle. Stop being so anti-bear.
So that he doesn't have to run after the rabbits, he will learn to raise them and manage them with a fake smile, providing them with a stable life lol.
Well, I think the thing is that we still live by the law: the strong do what they want, and the weak just whine and complain.
-
They already prosecute people under the unauthorized access provision. They just don’t prosecute rich people under it.
They prosecuted and convicted a guy under the CFAA for figuring out the URL schema for an AT&T website designed to be accessed by the iPad when it first launched, and then just visiting that site by trying every URL in a script. And then his lawyer (the foremost expert on the CFAA) got his conviction overturned:
United States v. Andrew Auernheimer
Andrew “Weev” Auernheimer was convicted of violating the Computer Fraud and Abuse Act ("CFAA") in New Jersey federal court and sentenced to 41 months in federal prison in March of 2013 for revealing to media outlets that AT...
Electronic Frontier Foundation (www.eff.org)
We have to maintain that fight, to make sure that the legal system doesn't criminalize normal computer tinkering, like using scripts or even browser settings in ways that site owners don't approve of.
-
That’s the entire point, dipshit. I wish we got one of the cool techno dystopias rather than this boring corporate idiot one.
I'm still holding out for Stephen Hawking to mail out Demon Summoning programs.
-
This post did not contain any content.
Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.
So, I assume Perplexity uses appropriate identifiable user-agent headers, to allow hosters to decide whether to serve them one way or another?
-
I think in Cloudflare’s case the free tier website owners are more an example of just giving the users a limited product in hopes of enticing them to upgrade to the paid product with more features and better performance. Cloudflare might get some benefit in the ability to track end-users across more websites as part of their efforts to determine who is a real human versus a potentially-malicious bot, but I don’t think that really gives the same ROI like Facebook or other services extract from their “free” services where the users are the actual product.
Actually, they've said that their free tier is what gives them a paid tier to sell to other people. They know most people aren't going to buy anything from them, but the are fine with that because they get to collect a ton of data about who is using hundreds of thousands of websites in order to figure out what traffic is bad. Without that huge user base, they can't do what they do.
And judging from the article, it's working out for them.
-
When sites put challenges like Anubis or other measures to authenticate that the viewer isn't a robot, and scrapers then employ measures to thwart that authentication (via spoofing or other means) I think that's a reasonable violation of the CFAA in spirit — especially since these mass scraping activities are getting attention for the damage they are causing to site operators (another factor in the CFAA, and one that would promote this to felony activity.)
The fact is these laws are already on the books, we may as well utilize them to shut down this objectively harmful activity AI scrapers are doing.
That same logic is how Aaron Swartz was cornered into suicide for scraping JSTOR, something widely agreed to be a bad idea by a wide range of lawspeople including SCOTUS in its 2021 decision Van Buren v. US that struck this interpretation off the books.
-
This post did not contain any content.
You could say they are... Perplexed.
-
This is a nice CloudFlare ad
yeah. still not worth dealing with fucking cloudflare. fuck cloudflare.
-
yeah. still not worth dealing with fucking cloudflare. fuck cloudflare.
DEATH TO CLOUDFLARE!
-
Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.
So, I assume Perplexity uses appropriate identifiable user-agent headers, to allow hosters to decide whether to serve them one way or another?
yeah it's almost like there as already a system for this in place
-
Nah, that would also mean using Newpipe, YoutubeDL, Revanced, and Tachiyomi would be a crime, and it would only take the re-introduction of WEI to extend that criminalization to the rest of the web ecosystem. It would be extremely shortsighted and foolish of me to cheer on the criminalization of user spoofing and browser automation because of this.
Do you think DoS/DDoS activities should be criminal?
If you're a site operator and the mass AI scraping is genuinely causing operational problems (not hard to imagine, I've seen what it does to my hosted repositories pages) should there be recourse? Especially if you're actively trying to prevent that activity (revoking consent in cookies, authorization captchas).
In general I think the idea of "your right to swing your fists ends at my face" applies reasonably well here — these AI scraping companies are giving lots of admins bloody noses and need to be held accountable.
I really am amenable to arguments wrt the right to an open web, but look at how many sites are hiding behind CF and other portals, or outright becoming hostile to any scraping at all; we're already seeing the rapid death of the ideal because of these malicious scrapers, and we should be using all available recourse to stop this bleeding.
-
Is there some simply deployable PHP honeytrap for AI crawlers?
Used to make tarpits with reverse proxies. Accept the connection and then set the responses for a few seconds before default TCP timeout. Doesn't eat much resource as long as you have enough TCP connections and can reuse them effectively.
-
Do you think DoS/DDoS activities should be criminal?
If you're a site operator and the mass AI scraping is genuinely causing operational problems (not hard to imagine, I've seen what it does to my hosted repositories pages) should there be recourse? Especially if you're actively trying to prevent that activity (revoking consent in cookies, authorization captchas).
In general I think the idea of "your right to swing your fists ends at my face" applies reasonably well here — these AI scraping companies are giving lots of admins bloody noses and need to be held accountable.
I really am amenable to arguments wrt the right to an open web, but look at how many sites are hiding behind CF and other portals, or outright becoming hostile to any scraping at all; we're already seeing the rapid death of the ideal because of these malicious scrapers, and we should be using all available recourse to stop this bleeding.
DoS attacks are already a crime, so of course the need for some kind of solution is clear. But any proposal that gatekeeps the internet and restricts the freedoms with which the user can interact with it is no solution at all. To me, the openness of the web shouldn't be something that people just consider, or are amenable to. It should be the foundation in which all reasonable proposals should consider as a principle truth.
-
Right? Isn’t this a textbook DMCA violation, too?
for us, not for them. wait until they argue in court that actually its us at fault and we need to provide access or else
-
This post did not contain any content.