linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

Technology

223 Beiträge 121 Kommentatoren 14 Aufrufe

U ubergeek@lemmy.today

How "open" a website is, is up to the owner, and that's all. Unless we're talking about de-privatizing the internet as a whole, here.
T This user is from outside of this forum
T This user is from outside of this forum
tomalley8342@lemmy.world

schrieb zuletzt editiert von

#197

How “open” a website is, is up to the owner, and that’s all.

As someone who registered this account on this platform in response to Reddit's API restrictions, it would be hypocritical of me to accept such a belief.
U 1 Antwort Letzte Antwort

0
T tempest@lemmy.ca

CloudFlare has become an Internet protection racket and I'm not happy about it.
L This user is from outside of this forum
L This user is from outside of this forum
laser@feddit.org

schrieb zuletzt editiert von

#198

It's been this from the very beginning. But they don't fit the definition of a protection racket as they're not the ones attacking you if you don't pay up. So they're more like a security company that has no competitors due to the needed investment to operate.
A 1 Antwort Letzte Antwort

12
D drmoose@lemmy.world

lmao imagine shilling for corporate Cloudflare like this. Also false positive vs false negative are fundamentally not equal.

Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.

The main issue with Cloudflare is that it's mostly bullshit. It does not report any stats to the admins on how many users were rejected or any false positive rates and happily put's everyone under "evil bot" umbrella. So people from low trust score environments like Linux or IPs from poorer countries are under significant disadvantage and left without a voice.

I'm literally a security dev working with Cloudflare anti-bot myself (not by choice). It's a useful tool for corporate but a really fucking bad one for the health of the web, much worse than any LLM agent or crawler, period.
L This user is from outside of this forum
L This user is from outside of this forum
laser@feddit.org

schrieb zuletzt editiert von

#199

So people from low trust score environments like Linux

Linux user here, Cloudflare hasn't blocked access to a single page for me unless I use a VPN, which then can trigger it.
1 Antwort Letzte Antwort

1
T tomalley8342@lemmy.world

How “open” a website is, is up to the owner, and that’s all.

As someone who registered this account on this platform in response to Reddit's API restrictions, it would be hypocritical of me to accept such a belief.
U This user is from outside of this forum
U This user is from outside of this forum
ubergeek@lemmy.today

schrieb zuletzt editiert von

#200

Well, until we abolish capitalism, that's the state of things. Unless you feel like Nazis MUST be freely given access to everything too?
T 1 Antwort Letzte Antwort

0
D drmoose@lemmy.world

It's insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.

Cloudflare is the biggest cancer on the web, fucking burn it.
C This user is from outside of this forum
C This user is from outside of this forum
catdogl0ver@lemmy.world

schrieb zuletzt editiert von

#201

It happened to me before until I did a Google search. It was my VPN web protection. It was too " over protective".

Check your security settings, antivirus and VPN
1 Antwort Letzte Antwort

0
I int32@lemmy.dbzer0.com

They can use web.archive.org as a cdn(I do that to cloudflare websites). But honestly, cloudflare or not, the internet is broken.
T This user is from outside of this forum
T This user is from outside of this forum
turmoil@feddit.org

schrieb zuletzt editiert von

#202

Using archive.org as a CDN at the scale of Cloudflare would be an immediate death sentence for archive.org.
I 1 Antwort Letzte Antwort

1
T threeganzi@sh.itjust.works

Does it not need to be scraped to be indexed, assuming it’s semi-typical RAG stuff?
E This user is from outside of this forum
E This user is from outside of this forum
electricd@lemmybefree.net

schrieb zuletzt editiert von

#203

I assume their script does some search engine stuff like query google or bing and then "scrap" the links they go on

Some selenium stuff
1 Antwort Letzte Antwort

0
T tempest@lemmy.ca

CloudFlare has become an Internet protection racket and I'm not happy about it.
E This user is from outside of this forum
E This user is from outside of this forum
electricd@lemmybefree.net

schrieb zuletzt editiert von electricd@lemmybefree.net

#204

they're good at protecting websites but damn, having a company being MITM feels so wrong
B 1 Antwort Letzte Antwort

3
T thegrandnagus@lemmy.world

Can't believe I've lived to see Cloudflare be the good guys
S This user is from outside of this forum
S This user is from outside of this forum
sunbeam60@lemmy.ml

schrieb zuletzt editiert von

#205

They’re not. They’re using this as an excuse to become paid gatekeepers of the internet as we know it. All that’s happening is that Cloudflare is using this to menuever into position where they can say “nice traffic you’ve got there - would be a shame if something happened to it”.

AI companies are crap.

What Cloudflare is doing here is also crap.

And we’re cheering it on.
1 Antwort Letzte Antwort

6
U ubergeek@lemmy.today

Well, until we abolish capitalism, that's the state of things. Unless you feel like Nazis MUST be freely given access to everything too?
T This user is from outside of this forum
T This user is from outside of this forum
tomalley8342@lemmy.world

schrieb zuletzt editiert von

#206

Well, until we abolish capitalism, that’s the state of things.

I can see that things are the way things are. Accepting it is a different matter.

Unless you feel like Nazis MUST be freely given access to everything too?

To me, the "access" that I am referring to (the interface with which you gain access to a service) and that "access" (your behavior once you have gained access to a service) are different topics. The same distinction can be made with the concern over DoS attacks mentioned earlier in the thread. The user's behavior of overwhelming a site's traffic is the root concern, not the interface that the user is connecting with.
1 Antwort Letzte Antwort

0
P pressanykeynow@lemmy.world

Can you explain please? How can I use archive.org as a cdn for my website?
I This user is from outside of this forum
I This user is from outside of this forum
int32@lemmy.dbzer0.com

schrieb zuletzt editiert von

#207

just take a snapshot of your website...
then make all links to your website link to that snapshot, and turn your server off.
P 1 Antwort Letzte Antwort

0
T turmoil@feddit.org

Using archive.org as a CDN at the scale of Cloudflare would be an immediate death sentence for archive.org.
I This user is from outside of this forum
I This user is from outside of this forum
int32@lemmy.dbzer0.com

schrieb zuletzt editiert von

#208

well I'm doing my part: https://addons.mozilla.org/en-US/firefox/addon/bcma/
sorry archive.org, I promise I'll donate ️
1 Antwort Letzte Antwort

0
U ubergeek@lemmy.today

I think it boils down to "consent" and "remuneration".

I run a website, that I do not consent to being accessed for LLMs. However, should LLMs use my content, I should be compensated for such use.

So, these LLM startups ignore both consent, and the idea of remuneration.

Most of these concepts have already been figured out for the purpose of law, if we consider websites much akin to real estate: Then, the typical trespass laws, compensatory usage, and hell, even eminent domain if needed ie, a city government can "take over" the boosted post feature to make sure alerts get pushed as widely and quickly as possible.
R This user is from outside of this forum
R This user is from outside of this forum
rdri@lemmy.world

schrieb zuletzt editiert von

#209

That all sounds very vague to me, and I don't expect it to be captured properly by law any time soon. Being accessed for LLM? What does it mean for you and how is it different from being accessed by a user? Imagine you host a weather forecast. If that information is public, what kind of compensation do you expect from anyone or anything who accesses that data?

Is it okay for a person to access your site? Is it okay for a script written by that person to fetch data every day automatically? Would it be okay for a user to dump a page of your site with a headless browser? Would it be okay to let an LLM take a look at it to extract info required by a user? Have you heard about changedetection.io project? If some of these sound unfair to you, you might want to put a DRM on your data or something.

Would you expect a compensation from me after reading your comment?
1 Antwort Letzte Antwort

0
I int32@lemmy.dbzer0.com

just take a snapshot of your website...
then make all links to your website link to that snapshot, and turn your server off.
P This user is from outside of this forum
P This user is from outside of this forum
pressanykeynow@lemmy.world

schrieb zuletzt editiert von

#210

Oh, well, it's okay if it suits for you. Just not at all an alternative to cloudflare.
I 1 Antwort Letzte Antwort

0
E electricd@lemmybefree.net

they're good at protecting websites but damn, having a company being MITM feels so wrong
B This user is from outside of this forum
B This user is from outside of this forum
bathing_in_bismuth@sh.itjust.works

schrieb zuletzt editiert von

#211

The shit they know. Plus their support for non-JS users or For are pure shite
I 1 Antwort Letzte Antwort

1
P pressanykeynow@lemmy.world

Oh, well, it's okay if it suits for you. Just not at all an alternative to cloudflare.
I This user is from outside of this forum
I This user is from outside of this forum
int32@lemmy.dbzer0.com

schrieb zuletzt editiert von

#212

I have an alternative to cloudflare, it's sitting in my living room and it's called a raspberry pi.
1 Antwort Letzte Antwort

0
D drmoose@lemmy.world

lmao imagine shilling for corporate Cloudflare like this. Also false positive vs false negative are fundamentally not equal.

Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.

The main issue with Cloudflare is that it's mostly bullshit. It does not report any stats to the admins on how many users were rejected or any false positive rates and happily put's everyone under "evil bot" umbrella. So people from low trust score environments like Linux or IPs from poorer countries are under significant disadvantage and left without a voice.

I'm literally a security dev working with Cloudflare anti-bot myself (not by choice). It's a useful tool for corporate but a really fucking bad one for the health of the web, much worse than any LLM agent or crawler, period.
D This user is from outside of this forum
D This user is from outside of this forum
dremor@lemmy.world

schrieb zuletzt editiert von

#213

Ah, the good old "you dont agree with me so you must be shilling for X" argument. I suppose you are shilling for the bots then, am I right ?
1 Antwort Letzte Antwort

0
G gamingchairmodel@lemmy.world

gaining unauthorized access to a computer system

And my point is that defining "unauthorized" to include visitors using unauthorized tools/methods to access a publicly visible resource would be a policy disaster.

If I put a banner on my site that says "by visiting my site you agree not to modify the scripts or ads displayed on the site," does that make my visit with an ad blocker "unauthorized" under the CFAA? I think the answer should obviously be "no," and that the way to define "authorization" is whether the website puts up some kind of login/authentication mechanism to block or allow specific users, not to put a simple request to the visiting public to please respect the rules of the site.

To me, a robots.txt is more like a friendly request to unauthenticated visitors than it is a technical implementation of some kind of authentication mechanism.

Scraping isn't hacking. I agree with the Third Circuit and the EFF: If the website owner makes a resource available to visitors without authentication, then accessing those resources isn't a crime, even if the website owner didn't intend for site visitors to use that specific method.
F This user is from outside of this forum
F This user is from outside of this forum
finitebanjo@lemmy.world

schrieb zuletzt editiert von

#214

Site owners currently do and should have the freedom to decide who is and is not allowed to access the data, and to decide for what purpose it gets used for. Idgaf if you think scraping is malicious or not, it is and should be illegal to violate clear and obvious barriers against them at the cost of the owners and unsanctioned profit of the scrapers off of the work of the site owners.
1 Antwort Letzte Antwort

0
L lime@feddit.nu

yeah it's almost like there as already a system for this in place
S This user is from outside of this forum
S This user is from outside of this forum
seraphine@lemmy.blahaj.zone

schrieb zuletzt editiert von

#215

THE CAKE DAY IS NOW. (i dont have an image at hand)
L 1 Antwort Letzte Antwort

0
S seraphine@lemmy.blahaj.zone

THE CAKE DAY IS NOW. (i dont have an image at hand)
L This user is from outside of this forum
L This user is from outside of this forum
lime@feddit.nu

schrieb zuletzt editiert von lime@feddit.nu

#216

i really wish we wouldn't do those. feels too reddity.

but thanks.
S 1 Antwort Letzte Antwort

1

Anmelden zum Antworten

A

What Does Palantir Actually Do?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
16

1

194 Stimmen

16 Beiträge

29 Aufrufe

D

Fear Peter Thiel and his gangbuster crew of excel homies and consultants Don't get me wrong, they're enablers of authoritarianists, but let's not give them too much credit. Magic? 🫧🧐🪠
D

Reddit plans to unify its search interface as it looks to become a search engine | TechCrunch
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
18

1

61 Stimmen

18 Beiträge

35 Aufrufe

Z

I painstakingly took a journey to hand delete each and every one of my posts and comments and then delete my user name. They got no free stuff outa me.
A

NOLA city council surprise discussion of facial recognition tech scheduled for this morning (June 30th) at 10 am
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

1

83 Stimmen

3 Beiträge

48 Aufrufe

I

Facial recognition hates jugalos and adversarial clothing patterns
P

Queer Dating Apps: Beware Who You Trust With Your Intimate Data
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

72 Stimmen

1 Beiträge

21 Aufrufe

Niemand hat geantwortet
P

Google Releases an Open Source AI Model for Live Music Creation, Magenta RealTime.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
6

1

34 Stimmen

6 Beiträge

77 Aufrufe

G

Neat. Looking forward to seeing what people build with that.
B

Sitting up and waiting.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

5 Stimmen

7 Beiträge

85 Aufrufe

A

What new AI slop hell is this?
T

NO KINGS! Tomorrow on Trump's birthday, we protest across the entire nation. Check the website for No Kings events near you!
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
13

2

179 Stimmen

13 Beiträge

134 Aufrufe

S

I will be there. I will be armed. I will carry a gas mask. I will carry water and medical for my compatriots. I will not start shit. I will fight back if it comes to it.
1

Freetube is the best way to watch YouTube
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
5

1

0 Stimmen

5 Beiträge

43 Aufrufe

1

Yeah there are some differences. Flatpaks are not updated when you update your system but you can run the "flatpak update" command to update all your Flatpak apps at once. After install, it should just work.