The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall
-
This post did not contain any content.
Oh no!
-
You'd think that a competent technology company, with their own AI would be able to figure out a way to spoof Cloudflare's checks. I'd still think that.
Perplexity: "But that would cost us moneeyyyy!"
-
You'd think that a competent technology company, with their own AI would be able to figure out a way to spoof Cloudflare's checks. I'd still think that.
see, but they're not competent. further, they don't care. most of these ai companies are snake oil. they're selling you a solution that doesn't meaningfully solve a problem. their main way of surviving is saying "this is what it can do now, just imagine what it can do if you invest money in my company."
they're scammers, the lot of them, running ponzi schemes with our money. if the planet dies for it, that's no concern of theirs. ponzi schemes require the schemer to have no long term plan, just a line of credit that they can keep drawing from until they skip town before the tax collector comes
-
Cloudflare runs as a CDN/cache/gateway service in front of a ton of websites. Their service is to help protect against DDOS and malicious traffic.
A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.
This is a response to that from Perplexity who run an AI search company. I don’t actually know how their service works, but they were specifically called out in the announcement and Cloudflare accused them of “stealth scraping” and ignoring robots.txt and other things.
A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.
I think it's also worth pointing out that all of the big AI companies are currently burning through cash at an absolutely astonishing rate, and none of them are anywhere close to being profitable. So pay-walling the data they use is probably gonna be pretty painful for their already-tortured bottom line (good).
-
For a lot of AI search I actually end up reading the pages, so I don’t know how much this stops that
You're the outlier, I promise. People are literally forfeiting their brains in favor of an LLM transplant hese days.
-
I actually agree with them
This feels like cloudflare trying to collect rent from both sides instead of doing what’s best for the website owners.
There is a problem with AI crawlers, but these technologies are essentially doing a search, fetching a several pages, scanning/summarizing them, then presenting the findings to the user.
I don’t really think that’s wrong, it’s just a faster version of rummaging through the SEO shit you do when you Google something.
(I’ve never used perplexity, I do use Kagi’s ki assistant for similar search. It runs 3 searches and scans the top results and then provides citations)
If a neighborhood is beset by roving bands of thieves, sooner or later strangers will be greeted by a shotgun rather than an invitation to tea, regardless of their intentions. Them's the breaks. Bots are going to take a hit now and their operators are just going to have to deal with it. Sucks when people don't play nice, but this is what you get.
-
This post did not contain any content.
When a firm outright admits to bypassing or trying to bypass measures taken to keep them out, you think that would be a slam dunk case of unauthorized access under the CFAA with felony enhancements.
-
You're the outlier, I promise. People are literally forfeiting their brains in favor of an LLM transplant hese days.
On the flip side, most websites are so ad-ridden these days a reader mode or other summary tool is almost required for normal browsing. Not saying that AI is the right move, but I can understand not wanting to visit the actual page any more.
-
This post did not contain any content.
Just buy cloudflare duh
-
This post did not contain any content.
Cry more, Perplexity.
-
Cloudflare runs as a CDN/cache/gateway service in front of a ton of websites. Their service is to help protect against DDOS and malicious traffic.
A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.
This is a response to that from Perplexity who run an AI search company. I don’t actually know how their service works, but they were specifically called out in the announcement and Cloudflare accused them of “stealth scraping” and ignoring robots.txt and other things.
But the website owner can still choose to continue blocking them right? Without using additional stuff like Anubis that is.
-
This post did not contain any content.
The amount of people just reacting to the headline in the comments on these kinds of articles is always surprising.
Your browser acts as an agent too, you don’t manually visit every script link, image source and CSS file. Everyone has experienced how annoying it is to have your browser be targeted by Cloudflare.
There’s a pretty major difference between a human user loading a page and having it summarized and a bot that is scraping 1500 pages/second.
Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation. But a user initiated operation isn’t the same as a bot.
Which is the point of the article and the article’s title.
It isn’t clear why OP had to alter the headline to bait the anti-ai crowd.
-
This post did not contain any content.
ask AI how to do it?
-
If a neighborhood is beset by roving bands of thieves, sooner or later strangers will be greeted by a shotgun rather than an invitation to tea, regardless of their intentions. Them's the breaks. Bots are going to take a hit now and their operators are just going to have to deal with it. Sucks when people don't play nice, but this is what you get.
I’m sure people that are attempting to drive to their house in a new vehicle wouldn’t appreciate being riddled with bullets because the neighborhood watch makes no attempt to distinguish between thieves and homeowners.
-
This post did not contain any content.
I set up a WAF for my company's publicly facing developer portal to block out bot traffic from assholes like these guys. It reduced bot traffic to the site by something like - I kid you not - 99.999%.
Fucking data vultures.
-
I’m sure people that are attempting to drive to their house in a new vehicle wouldn’t appreciate being riddled with bullets because the neighborhood watch makes no attempt to distinguish between thieves and homeowners.
So sad for them. Try not living in a war zone?
-
When a firm outright admits to bypassing or trying to bypass measures taken to keep them out, you think that would be a slam dunk case of unauthorized access under the CFAA with felony enhancements.
Fuck that. I don't need prosecutors and the courts to rule that accessing publicly available information in a way that the website owner doesn't want is literally a crime. That logic would extend to ad blockers and editing HTML/js in an "inspect element" tag.
-
The amount of people just reacting to the headline in the comments on these kinds of articles is always surprising.
Your browser acts as an agent too, you don’t manually visit every script link, image source and CSS file. Everyone has experienced how annoying it is to have your browser be targeted by Cloudflare.
There’s a pretty major difference between a human user loading a page and having it summarized and a bot that is scraping 1500 pages/second.
Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation. But a user initiated operation isn’t the same as a bot.
Which is the point of the article and the article’s title.
It isn’t clear why OP had to alter the headline to bait the anti-ai crowd.
But a user initiated operation isn’t the same as a bot.
Oh fuck off with that AI company propaganda.
The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It's the same fucking thing.
Web crawlers for search engines don't scrape pages every time a user searches like AI does. Both web crawlers and scrapers are bots, and how a human initiates their operation, scheduled or not, doesn't matter as much as the fact that they do things very differently and only one of the two respects robots.txt.
-
On the flip side, most websites are so ad-ridden these days a reader mode or other summary tool is almost required for normal browsing. Not saying that AI is the right move, but I can understand not wanting to visit the actual page any more.
On the flip side, most websites are so ad-ridden these days a reader mode or other summary tool is almost required for normal browsing.
Firefox with uBlock Origin works perfectly fine and pages load faster without the ads!
-
A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.
I think it's also worth pointing out that all of the big AI companies are currently burning through cash at an absolutely astonishing rate, and none of them are anywhere close to being profitable. So pay-walling the data they use is probably gonna be pretty painful for their already-tortured bottom line (good).
It's more than simply astonishing, it's mind-blowingly bonkers how much money they have to burn to see ANY amount of return. You think a normal company is bad, blowing a few thousand bucks on materials, equipment, and labor per day in order to make a few bucks revenue (not profit)? AI companies have to blow HUNDREDS OF BILLIONS on massive data center complexes in order to train their bots, and then the energy cost and water cost of running them adds a couple more million a day. ALL so they can make negative hundreds of dollars on every prompt you can dream of.
The ONLY reason AI firms are still a thing in the current tech tree is because Techbros everywhere have convinced the uberwealthy VC firms that AGI is RIGHT AROUND THE CORNER, and will save them SO much money on labor and efficiency that it'll all be worth it in permanent, pure, infinite profit. If that sounds like too much of a pipe dream to be realistic, congratulations, you're a sane and rational human being.