The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall
-
It's insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.
Cloudflare is the biggest cancer on the web, fucking burn it.
It happened to me before until I did a Google search. It was my VPN web protection. It was too " over protective".
Check your security settings, antivirus and VPN
-
They can use web.archive.org as a cdn(I do that to cloudflare websites). But honestly, cloudflare or not, the internet is broken.
Using archive.org as a CDN at the scale of Cloudflare would be an immediate death sentence for archive.org.
-
Does it not need to be scraped to be indexed, assuming it’s semi-typical RAG stuff?
I assume their script does some search engine stuff like query google or bing and then "scrap" the links they go on
Some selenium stuff
-
CloudFlare has become an Internet protection racket and I'm not happy about it.
they're good at protecting websites but damn, having a company being MITM feels so wrong
-
Can't believe I've lived to see Cloudflare be the good guys
They’re not. They’re using this as an excuse to become paid gatekeepers of the internet as we know it. All that’s happening is that Cloudflare is using this to menuever into position where they can say “nice traffic you’ve got there - would be a shame if something happened to it”.
AI companies are crap.
What Cloudflare is doing here is also crap.
And we’re cheering it on.
-
Well, until we abolish capitalism, that's the state of things. Unless you feel like Nazis MUST be freely given access to everything too?
Well, until we abolish capitalism, that’s the state of things.
I can see that things are the way things are. Accepting it is a different matter.
Unless you feel like Nazis MUST be freely given access to everything too?
To me, the "access" that I am referring to (the interface with which you gain access to a service) and that "access" (your behavior once you have gained access to a service) are different topics. The same distinction can be made with the concern over DoS attacks mentioned earlier in the thread. The user's behavior of overwhelming a site's traffic is the root concern, not the interface that the user is connecting with.
-
Can you explain please? How can I use archive.org as a cdn for my website?
just take a snapshot of your website...
then make all links to your website link to that snapshot, and turn your server off. -
Using archive.org as a CDN at the scale of Cloudflare would be an immediate death sentence for archive.org.
well I'm doing my part: https://addons.mozilla.org/en-US/firefox/addon/bcma/
sorry archive.org, I promise I'll donate️
-
I think it boils down to "consent" and "remuneration".
I run a website, that I do not consent to being accessed for LLMs. However, should LLMs use my content, I should be compensated for such use.
So, these LLM startups ignore both consent, and the idea of remuneration.
Most of these concepts have already been figured out for the purpose of law, if we consider websites much akin to real estate: Then, the typical trespass laws, compensatory usage, and hell, even eminent domain if needed ie, a city government can "take over" the boosted post feature to make sure alerts get pushed as widely and quickly as possible.
That all sounds very vague to me, and I don't expect it to be captured properly by law any time soon. Being accessed for LLM? What does it mean for you and how is it different from being accessed by a user? Imagine you host a weather forecast. If that information is public, what kind of compensation do you expect from anyone or anything who accesses that data?
Is it okay for a person to access your site? Is it okay for a script written by that person to fetch data every day automatically? Would it be okay for a user to dump a page of your site with a headless browser? Would it be okay to let an LLM take a look at it to extract info required by a user? Have you heard about changedetection.io project? If some of these sound unfair to you, you might want to put a DRM on your data or something.
Would you expect a compensation from me after reading your comment?
-
just take a snapshot of your website...
then make all links to your website link to that snapshot, and turn your server off.Oh, well, it's okay if it suits for you. Just not at all an alternative to cloudflare.
-
they're good at protecting websites but damn, having a company being MITM feels so wrong
The shit they know. Plus their support for non-JS users or For are pure shite
-
-
-
-
Brazil's supreme court rules that platforms like Facebook and X can be held liable for user posts, requiring them to remove content even without a court order
Technology1
-
-
We built our own AI assistant (J-TECH AI) to showcase what we can do – here’s what it does and why
Technology2
-
-
Surprise! People don't want AI deciding who gets a kidney transplant and who dies or endures years of misery
Technology1