Skip to content

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

Technology
119 67 0
  • The amount of people just reacting to the headline in the comments on these kinds of articles is always surprising.

    Your browser acts as an agent too, you don’t manually visit every script link, image source and CSS file. Everyone has experienced how annoying it is to have your browser be targeted by Cloudflare.

    There’s a pretty major difference between a human user loading a page and having it summarized and a bot that is scraping 1500 pages/second.

    Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation. But a user initiated operation isn’t the same as a bot.

    Which is the point of the article and the article’s title.

    It isn’t clear why OP had to alter the headline to bait the anti-ai crowd.

    But a user initiated operation isn’t the same as a bot.

    Oh fuck off with that AI company propaganda.

    The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It's the same fucking thing.

    Web crawlers for search engines don't scrape pages every time a user searches like AI does. Both web crawlers and scrapers are bots, and how a human initiates their operation, scheduled or not, doesn't matter as much as the fact that they do things very differently and only one of the two respects robots.txt.

  • On the flip side, most websites are so ad-ridden these days a reader mode or other summary tool is almost required for normal browsing. Not saying that AI is the right move, but I can understand not wanting to visit the actual page any more.

    On the flip side, most websites are so ad-ridden these days a reader mode or other summary tool is almost required for normal browsing.

    Firefox with uBlock Origin works perfectly fine and pages load faster without the ads!

  • A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.

    I think it's also worth pointing out that all of the big AI companies are currently burning through cash at an absolutely astonishing rate, and none of them are anywhere close to being profitable. So pay-walling the data they use is probably gonna be pretty painful for their already-tortured bottom line (good).

    It's more than simply astonishing, it's mind-blowingly bonkers how much money they have to burn to see ANY amount of return. You think a normal company is bad, blowing a few thousand bucks on materials, equipment, and labor per day in order to make a few bucks revenue (not profit)? AI companies have to blow HUNDREDS OF BILLIONS on massive data center complexes in order to train their bots, and then the energy cost and water cost of running them adds a couple more million a day. ALL so they can make negative hundreds of dollars on every prompt you can dream of.

    The ONLY reason AI firms are still a thing in the current tech tree is because Techbros everywhere have convinced the uberwealthy VC firms that AGI is RIGHT AROUND THE CORNER, and will save them SO much money on labor and efficiency that it'll all be worth it in permanent, pure, infinite profit. If that sounds like too much of a pipe dream to be realistic, congratulations, you're a sane and rational human being.

  • So sad for them. Try not living in a war zone?

    It isn’t a war zone, it’s a gated community where the guards have suddenly decided that any vehicle made after 2020 is full of thieves.

    They didn’t bother to consult the residents or give them the ability to opt out of having their dinner guests murdered for driving a vehicle the security guards don’t like.

  • It isn’t a war zone, it’s a gated community where the guards have suddenly decided that any vehicle made after 2020 is full of thieves.

    They didn’t bother to consult the residents or give them the ability to opt out of having their dinner guests murdered for driving a vehicle the security guards don’t like.

    So you're a cloudflare customer and you wish they would let the perplexity traffic multiplier through to your website? You can leave cloudflare any time you want.

  • The amount of people just reacting to the headline in the comments on these kinds of articles is always surprising.

    Your browser acts as an agent too, you don’t manually visit every script link, image source and CSS file. Everyone has experienced how annoying it is to have your browser be targeted by Cloudflare.

    There’s a pretty major difference between a human user loading a page and having it summarized and a bot that is scraping 1500 pages/second.

    Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation. But a user initiated operation isn’t the same as a bot.

    Which is the point of the article and the article’s title.

    It isn’t clear why OP had to alter the headline to bait the anti-ai crowd.

    In a better timeline, we wouldn't need to cheer the victory of one megacorporation over another, they would both be the losers. But also people are still capable of holding two thoughts simultaneously.

    For instance, we'd all be happy to see Apple lose the Epic Games lawsuit and be forced out of their monopoly on app stores on iOS. But those same people are aware it would allow Epic to continue being a disgusting company.

    bait the anti-ai crowd

    Oh I see lol

  • Cloudflare runs as a CDN/cache/gateway service in front of a ton of websites. Their service is to help protect against DDOS and malicious traffic.

    A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.

    This is a response to that from Perplexity who run an AI search company. I don’t actually know how their service works, but they were specifically called out in the announcement and Cloudflare accused them of “stealth scraping” and ignoring robots.txt and other things.

    they don't outright block ai crawlers. they added some new tools and options for managing or blocking ai bot traffic which the cloudflare customer can choose to use or to not use.

    im running a free educational resource and i let the crawlers hit my site all they want because its useful knowledge unavailable anywhere else and it's served to them from cloudflare's free tier cache. i just don't know why they have to read it ten thousand times a day.

  • This post did not contain any content.

    I can’t get over their CEO that looks like a nine year old. Not sure what it is about him

  • Just buy cloudflare duh

    The anti-AI shield and bot-fight mode are free, you don't need to pay anything to use them.

  • Fuck that. I don't need prosecutors and the courts to rule that accessing publicly available information in a way that the website owner doesn't want is literally a crime. That logic would extend to ad blockers and editing HTML/js in an "inspect element" tag.

    That logic would not extend to ad blockers, as the point of concern is gaining unauthorized access to a computer system or asset. Blocking ads would not be considered gaining unauthorized access to anything. In fact it would be the opposite of that.

  • Cloudflare runs as a CDN/cache/gateway service in front of a ton of websites. Their service is to help protect against DDOS and malicious traffic.

    A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.

    This is a response to that from Perplexity who run an AI search company. I don’t actually know how their service works, but they were specifically called out in the announcement and Cloudflare accused them of “stealth scraping” and ignoring robots.txt and other things.

    It should be pointed out that Cloudflare didn't say they were going to block AI traffic, they give you the option to. The service is a free opt-in for people who want it.

  • This post did not contain any content.

    rare cloudflare w

  • On the flip side, most websites are so ad-ridden these days a reader mode or other summary tool is almost required for normal browsing. Not saying that AI is the right move, but I can understand not wanting to visit the actual page any more.

    Maybe I missed something, but ublock still works very fine for me, even on mobile. And running a pihole, while not trivial, also takes care of some ad traffic. Firefox coems with a reader mode (a feature I really like even with the adblockers!).

    So why do people not want to visit pages anymore, if all these tools already existed?

  • The amount of people just reacting to the headline in the comments on these kinds of articles is always surprising.

    Your browser acts as an agent too, you don’t manually visit every script link, image source and CSS file. Everyone has experienced how annoying it is to have your browser be targeted by Cloudflare.

    There’s a pretty major difference between a human user loading a page and having it summarized and a bot that is scraping 1500 pages/second.

    Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation. But a user initiated operation isn’t the same as a bot.

    Which is the point of the article and the article’s title.

    It isn’t clear why OP had to alter the headline to bait the anti-ai crowd.

    Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation.

    Well I suppose it's a good thing then that the anti-AI shield is opt-in, and Cloudflare isn't making any decisions for anyone on whether or not AI scrapers get to visit their pages. That little bit of context makes your entire argument fall apart.

  • But a user initiated operation isn’t the same as a bot.

    Oh fuck off with that AI company propaganda.

    The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It's the same fucking thing.

    Web crawlers for search engines don't scrape pages every time a user searches like AI does. Both web crawlers and scrapers are bots, and how a human initiates their operation, scheduled or not, doesn't matter as much as the fact that they do things very differently and only one of the two respects robots.txt.

    There’s no difference in server load between a user looking at a page and a user using an AI tool to summarize the page.

    The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It’s the same fucking thing.

    You either didn’t read the article or are deliberately making bad faith arguments. The entire point of the article is that the traffic that they’re referring to is initiated by a user, just like when you type an address into your browser’s address bar.

    This traffic, initiated by a user, creates the same server load as that same user loading the page in a browser.

    Yes, mass scraping of web pages creates a bunch of server load. This was the case before AI was even a thing.

    This situation is like Cloudflare presenting was a captcha in order to load each individual image, css or JavaScript asset into a web browser because bot traffic pretends to be a browser.

    I don’t think it’s too hard to understand that a bot pretending to be a browser and a human operated browser are two completely different things and classifying them as the same (and captchaing them) would be a classification error.

    This is exactly the same kind of error. Even if you personally believe that users using AI tools should be blocked, not everyone has the same opinion. If Cloudflare can’t distinguish between bot requests and human requests then their customers can’t opt out and allow their users to use AI tools even if they want to.

  • Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation.

    Well I suppose it's a good thing then that the anti-AI shield is opt-in, and Cloudflare isn't making any decisions for anyone on whether or not AI scrapers get to visit their pages. That little bit of context makes your entire argument fall apart.

    It isn’t opt in.

    You can block all bot page scraping, and also block user initiated AI tools or you can block no traffic.

    There isn’t an option to block bot page scraping but allow user initiated AI tools.

    Because, as the article points out, Cloudflare is not able to distinguish between the two

  • It's difficult to be a shittier company than OpenAI, but Perplexity seems to be trying hard.

    Step 1, SOMEHOW find a more punchable face than Altman

  • So you're a cloudflare customer and you wish they would let the perplexity traffic multiplier through to your website? You can leave cloudflare any time you want.

    🙄You’re an Internet user and you don’t like AI so you can leave the Internet anytime you want.

    That’s not a good argument, what about the users who want to block mass scraping but want to make their content available to users who are using these tools? Cloudflare exists because it allows legitimate traffic, that websites want, and blocks mass scraping which the sites don’t want.

    If they’re not able to distinguish mass scraping traffic from user created traffic then they’re blocking legitimate users that some website owners want.

  • This post did not contain any content.

    Skill issue. Cope and seethe

  • The anti-AI shield and bot-fight mode are free, you don't need to pay anything to use them.

    No I'm telling Perplexity, they can just buy their obstacle

    People who use the things you have described, for free
    are themselves the products being sold
    this is implied in the price

  • Seo Execuitve

    Technology technology
    1
    1
    0 Stimmen
    1 Beiträge
    14 Aufrufe
    Niemand hat geantwortet
  • 89 Stimmen
    15 Beiträge
    175 Aufrufe
    S
    I suspect people (not billionaires) are realising that they can get by with less. And that the planet needs that too. And that working 40+ hours a week isn’t giving people what they really want either. Tbh, I don't think that's the case. If you look at any of the relevant metrics (CO², energy consumption, plastic waste, ...) they only know one direction globally and that's up. I think the actual issues are Russian invasion of Ukraine and associated sanctions on one of the main energy providers of Europe Trump's "trade wars" which make global supply lines unreliable and costs incalculable (global supply chains love nothing more than uncertainty) Uncertainty in regards to China/Taiwan Boomers retiring in western countries, which for the first time since pretty much ever means that the work force is shrinking instead of growing. Economical growth was mostly driven by population growth for the last half century with per-capita productivity staying very close to inflation. Disrupting changes in key industries like cars and energy. The west has been sleeping on may of these developments (e.g. electric cars, batteries, solar) and now China is curbstomping the rest of the world in regards to market share. High key interest rates (which are applied to reduce high inflation due to some of the reason above) reduce demand on financial investments into companies. The low interest rates of the 2010s and also before lead to more investments into companies. With interest going back up, investments dry up. All these changes mean that companies, countries and people in the west have much less free cash available. There’s also the value of money has never been lower either. That's been the case since every. Inflation has always been a thing and with that the value of money is monotonically decreasing. But that doesn't really matter for the whole argument, since the absolute value of money doesn't matter, only the relative value. To put it differently: If you earn €100 and the thing you want to buy costs €10, that is equivalent to if you earn €1000 and the thing you want to buy costing €100. The value of money dropping is only relevant for savings, and if people are saving too much then the economy slows down and jobs are cut, thus some inflation is positive or even required. What is an actual issue is that wages are not increasing at the same rate as the cost of things, but that's not a "value of the money" issue.
  • Financial 'stretch' for UK to join Europe's Starlink rival

    Technology technology
    1
    1
    29 Stimmen
    1 Beiträge
    19 Aufrufe
    Niemand hat geantwortet
  • 162 Stimmen
    7 Beiträge
    62 Aufrufe
    L
    I wonder if they could develop this into a tooth coating. Preventing biofilms would go a long way to preventing cavities.
  • 634 Stimmen
    75 Beiträge
    1k Aufrufe
    D
    theyll only stop selling politicians and block that
  • 81 Stimmen
    6 Beiträge
    74 Aufrufe
    merde@sh.itjust.worksM
    (common people, this is the fediverse) [image: 922f7388-85b1-463d-9cdd-286adbb6a27b.jpeg]
  • Elon Musk's X temporarily down for tens of thousands of users

    Technology technology
    1
    1
    0 Stimmen
    1 Beiträge
    18 Aufrufe
    Niemand hat geantwortet
  • 13 Stimmen
    6 Beiträge
    69 Aufrufe
    rinse@lemmy.worldR
    Protocol implementation plebbit-js is separated from client like Seedit