The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall
-
How "open" a website is, is up to the owner, and that's all. Unless we're talking about de-privatizing the internet as a whole, here.
How “open” a website is, is up to the owner, and that’s all.
As someone who registered this account on this platform in response to Reddit's API restrictions, it would be hypocritical of me to accept such a belief.
-
CloudFlare has become an Internet protection racket and I'm not happy about it.
It's been this from the very beginning. But they don't fit the definition of a protection racket as they're not the ones attacking you if you don't pay up. So they're more like a security company that has no competitors due to the needed investment to operate.
-
lmao imagine shilling for corporate Cloudflare like this. Also false positive vs false negative are fundamentally not equal.
Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.
The main issue with Cloudflare is that it's mostly bullshit. It does not report any stats to the admins on how many users were rejected or any false positive rates and happily put's everyone under "evil bot" umbrella. So people from low trust score environments like Linux or IPs from poorer countries are under significant disadvantage and left without a voice.
I'm literally a security dev working with Cloudflare anti-bot myself (not by choice). It's a useful tool for corporate but a really fucking bad one for the health of the web, much worse than any LLM agent or crawler, period.
So people from low trust score environments like Linux
Linux user here, Cloudflare hasn't blocked access to a single page for me unless I use a VPN, which then can trigger it.
-
How “open” a website is, is up to the owner, and that’s all.
As someone who registered this account on this platform in response to Reddit's API restrictions, it would be hypocritical of me to accept such a belief.
Well, until we abolish capitalism, that's the state of things. Unless you feel like Nazis MUST be freely given access to everything too?
-
It's insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.
Cloudflare is the biggest cancer on the web, fucking burn it.
It happened to me before until I did a Google search. It was my VPN web protection. It was too " over protective".
Check your security settings, antivirus and VPN
-
They can use web.archive.org as a cdn(I do that to cloudflare websites). But honestly, cloudflare or not, the internet is broken.
Using archive.org as a CDN at the scale of Cloudflare would be an immediate death sentence for archive.org.
-
Does it not need to be scraped to be indexed, assuming it’s semi-typical RAG stuff?
I assume their script does some search engine stuff like query google or bing and then "scrap" the links they go on
Some selenium stuff
-
CloudFlare has become an Internet protection racket and I'm not happy about it.
they're good at protecting websites but damn, having a company being MITM feels so wrong
-
Can't believe I've lived to see Cloudflare be the good guys
They’re not. They’re using this as an excuse to become paid gatekeepers of the internet as we know it. All that’s happening is that Cloudflare is using this to menuever into position where they can say “nice traffic you’ve got there - would be a shame if something happened to it”.
AI companies are crap.
What Cloudflare is doing here is also crap.
And we’re cheering it on.
-
Well, until we abolish capitalism, that's the state of things. Unless you feel like Nazis MUST be freely given access to everything too?
Well, until we abolish capitalism, that’s the state of things.
I can see that things are the way things are. Accepting it is a different matter.
Unless you feel like Nazis MUST be freely given access to everything too?
To me, the "access" that I am referring to (the interface with which you gain access to a service) and that "access" (your behavior once you have gained access to a service) are different topics. The same distinction can be made with the concern over DoS attacks mentioned earlier in the thread. The user's behavior of overwhelming a site's traffic is the root concern, not the interface that the user is connecting with.
-
Can you explain please? How can I use archive.org as a cdn for my website?
just take a snapshot of your website...
then make all links to your website link to that snapshot, and turn your server off. -
Using archive.org as a CDN at the scale of Cloudflare would be an immediate death sentence for archive.org.
well I'm doing my part: https://addons.mozilla.org/en-US/firefox/addon/bcma/
sorry archive.org, I promise I'll donate️
-
I think it boils down to "consent" and "remuneration".
I run a website, that I do not consent to being accessed for LLMs. However, should LLMs use my content, I should be compensated for such use.
So, these LLM startups ignore both consent, and the idea of remuneration.
Most of these concepts have already been figured out for the purpose of law, if we consider websites much akin to real estate: Then, the typical trespass laws, compensatory usage, and hell, even eminent domain if needed ie, a city government can "take over" the boosted post feature to make sure alerts get pushed as widely and quickly as possible.
That all sounds very vague to me, and I don't expect it to be captured properly by law any time soon. Being accessed for LLM? What does it mean for you and how is it different from being accessed by a user? Imagine you host a weather forecast. If that information is public, what kind of compensation do you expect from anyone or anything who accesses that data?
Is it okay for a person to access your site? Is it okay for a script written by that person to fetch data every day automatically? Would it be okay for a user to dump a page of your site with a headless browser? Would it be okay to let an LLM take a look at it to extract info required by a user? Have you heard about changedetection.io project? If some of these sound unfair to you, you might want to put a DRM on your data or something.
Would you expect a compensation from me after reading your comment?
-
just take a snapshot of your website...
then make all links to your website link to that snapshot, and turn your server off.Oh, well, it's okay if it suits for you. Just not at all an alternative to cloudflare.
-
they're good at protecting websites but damn, having a company being MITM feels so wrong
The shit they know. Plus their support for non-JS users or For are pure shite
-
Oh, well, it's okay if it suits for you. Just not at all an alternative to cloudflare.
I have an alternative to cloudflare, it's sitting in my living room and it's called a raspberry pi.
-
lmao imagine shilling for corporate Cloudflare like this. Also false positive vs false negative are fundamentally not equal.
Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.
The main issue with Cloudflare is that it's mostly bullshit. It does not report any stats to the admins on how many users were rejected or any false positive rates and happily put's everyone under "evil bot" umbrella. So people from low trust score environments like Linux or IPs from poorer countries are under significant disadvantage and left without a voice.
I'm literally a security dev working with Cloudflare anti-bot myself (not by choice). It's a useful tool for corporate but a really fucking bad one for the health of the web, much worse than any LLM agent or crawler, period.
Ah, the good old "you dont agree with me so you must be shilling for X" argument. I suppose you are shilling for the bots then, am I right ?
-
gaining unauthorized access to a computer system
And my point is that defining "unauthorized" to include visitors using unauthorized tools/methods to access a publicly visible resource would be a policy disaster.
If I put a banner on my site that says "by visiting my site you agree not to modify the scripts or ads displayed on the site," does that make my visit with an ad blocker "unauthorized" under the CFAA? I think the answer should obviously be "no," and that the way to define "authorization" is whether the website puts up some kind of login/authentication mechanism to block or allow specific users, not to put a simple request to the visiting public to please respect the rules of the site.
To me, a robots.txt is more like a friendly request to unauthenticated visitors than it is a technical implementation of some kind of authentication mechanism.
Scraping isn't hacking. I agree with the Third Circuit and the EFF: If the website owner makes a resource available to visitors without authentication, then accessing those resources isn't a crime, even if the website owner didn't intend for site visitors to use that specific method.
Site owners currently do and should have the freedom to decide who is and is not allowed to access the data, and to decide for what purpose it gets used for. Idgaf if you think scraping is malicious or not, it is and should be illegal to violate clear and obvious barriers against them at the cost of the owners and unsanctioned profit of the scrapers off of the work of the site owners.
-
yeah it's almost like there as already a system for this in place
THE CAKE DAY IS NOW. (i dont have an image at hand)
-
THE CAKE DAY IS NOW. (i dont have an image at hand)
i really wish we wouldn't do those. feels too reddity.
but thanks.
-
-
Reddit plans to unify its search interface as it looks to become a search engine | TechCrunch
Technology1
-
NOLA city council surprise discussion of facial recognition tech scheduled for this morning (June 30th) at 10 am
Technology1
-
-
-
-
NO KINGS! Tomorrow on Trump's birthday, we protest across the entire nation. Check the website for No Kings events near you!
Technology2
-