The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall
-
This post did not contain any content.
-
This post did not contain any content.
It's difficult to be a shittier company than OpenAI, but Perplexity seems to be trying hard.
-
This post did not contain any content.
Uh.. good?
-
This post did not contain any content.
I actually agree with them
This feels like cloudflare trying to collect rent from both sides instead of doing what’s best for the website owners.
There is a problem with AI crawlers, but these technologies are essentially doing a search, fetching a several pages, scanning/summarizing them, then presenting the findings to the user.
I don’t really think that’s wrong, it’s just a faster version of rummaging through the SEO shit you do when you Google something.
(I’ve never used perplexity, I do use Kagi’s ki assistant for similar search. It runs 3 searches and scans the top results and then provides citations)
-
This post did not contain any content.
Can someone with more knowledge shine a bit more light on this while situation? Im out of the loop on the technical details
-
This post did not contain any content.
You'd think that a competent technology company, with their own AI would be able to figure out a way to spoof Cloudflare's checks. I'd still think that.
-
I actually agree with them
This feels like cloudflare trying to collect rent from both sides instead of doing what’s best for the website owners.
There is a problem with AI crawlers, but these technologies are essentially doing a search, fetching a several pages, scanning/summarizing them, then presenting the findings to the user.
I don’t really think that’s wrong, it’s just a faster version of rummaging through the SEO shit you do when you Google something.
(I’ve never used perplexity, I do use Kagi’s ki assistant for similar search. It runs 3 searches and scans the top results and then provides citations)
What’s best for the website owners is to have people actually visit and interact with their website. Blocking AI tools is consistent with that.
-
Can someone with more knowledge shine a bit more light on this while situation? Im out of the loop on the technical details
Cloudflare runs as a CDN/cache/gateway service in front of a ton of websites. Their service is to help protect against DDOS and malicious traffic.
A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.
This is a response to that from Perplexity who run an AI search company. I don’t actually know how their service works, but they were specifically called out in the announcement and Cloudflare accused them of “stealth scraping” and ignoring robots.txt and other things.
-
I actually agree with them
This feels like cloudflare trying to collect rent from both sides instead of doing what’s best for the website owners.
There is a problem with AI crawlers, but these technologies are essentially doing a search, fetching a several pages, scanning/summarizing them, then presenting the findings to the user.
I don’t really think that’s wrong, it’s just a faster version of rummaging through the SEO shit you do when you Google something.
(I’ve never used perplexity, I do use Kagi’s ki assistant for similar search. It runs 3 searches and scans the top results and then provides citations)
Search engines been going relatively fine for decades now. But the crawlers from AI companies basically DDOS hosts in comparison, sending so many requests in such a short interval. Crawling dynamic links as well that are expensive to render compared to a static page, ignoring the robots.txt entirely, or even using it discover unlinked pages.
Servers have finite resources, especially self hosted sites, while AI companies have disproportinately more at their disposal, easily grinding other systems to a halt by overwhelming them with requests.
-
What’s best for the website owners is to have people actually visit and interact with their website. Blocking AI tools is consistent with that.
For a lot of AI search I actually end up reading the pages, so I don’t know how much this stops that
-
Can someone with more knowledge shine a bit more light on this while situation? Im out of the loop on the technical details
AI crawlers tend to overwhelm websites by doing the least efficient scraping of data possible, basically DDOSing a huge portion of the internet. Perplexity already scraped the net for training data and is now hammering it inefficiently for searches.
Cloudflare is just trying to keep the bots from overwhelming everything.
-
You'd think that a competent technology company, with their own AI would be able to figure out a way to spoof Cloudflare's checks. I'd still think that.
Or find a more efficient way to manage data, since their current approach is basically DDOSing the internet for training data and also for responding to user interactions.
-
This post did not contain any content.
Ooh, that's though sweetheart. If the owners of those servers want you to visit, they'll just choose another WAF than CF's.
All zero of them.
-
This post did not contain any content.
Cry me a river
-
This post did not contain any content.
-
Can someone with more knowledge shine a bit more light on this while situation? Im out of the loop on the technical details
Perplexity (an "AI search engine" company with 500 million in funding) can't bypass cloudflare's anti-bot checks. For each search Perplexity scrapes the top results and summarizes them for the user. Cloudflare intentionally blocks perplexity's scrapers because they ignore robots.txt and mimic real users to get around cloudflare's blocking features. Perplexity argues that their scraping is acceptable because it's user initiated.
Personally I think cloudflare is in the right here. The scraped sites get 0 revenue from Perplexity searches (unless the user decides to go through the sources section and click the links) and Perplexity's scraping is unnecessarily traffic intensive since they don't cache the scraped data.
-
This post did not contain any content.
Oh no!
-
You'd think that a competent technology company, with their own AI would be able to figure out a way to spoof Cloudflare's checks. I'd still think that.
Perplexity: "But that would cost us moneeyyyy!"
-
You'd think that a competent technology company, with their own AI would be able to figure out a way to spoof Cloudflare's checks. I'd still think that.
see, but they're not competent. further, they don't care. most of these ai companies are snake oil. they're selling you a solution that doesn't meaningfully solve a problem. their main way of surviving is saying "this is what it can do now, just imagine what it can do if you invest money in my company."
they're scammers, the lot of them, running ponzi schemes with our money. if the planet dies for it, that's no concern of theirs. ponzi schemes require the schemer to have no long term plan, just a line of credit that they can keep drawing from until they skip town before the tax collector comes
-
Cloudflare runs as a CDN/cache/gateway service in front of a ton of websites. Their service is to help protect against DDOS and malicious traffic.
A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.
This is a response to that from Perplexity who run an AI search company. I don’t actually know how their service works, but they were specifically called out in the announcement and Cloudflare accused them of “stealth scraping” and ignoring robots.txt and other things.
A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.
I think it's also worth pointing out that all of the big AI companies are currently burning through cash at an absolutely astonishing rate, and none of them are anywhere close to being profitable. So pay-walling the data they use is probably gonna be pretty painful for their already-tortured bottom line (good).
-
Social media influencer Andrew Tate sues Meta, TikTok for over $50 million for ‘deplatforming’ him
Technology1
-
-
Scientists spot ‘superorganism’ in the wild for the first time and it’s made of worms, In a groundbreaking discovery, scientists have observed nematodes, tiny worms, forming 'living towers' in nature
Technology1
-
-
-
Gumroad Founder Sahil Lavingia Reveals He Was Let Go from DOGE as Software Engineer for the Department of Veterans Affairs After Just 55 Days
Technology1
-
-
Paul McCartney and Dua Lipa urge UK Prime Minister to rethink his AI copyright plans. A new law could soon allow AI companies to use copyrighted material without permission.
Technology1