linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

Technology

223 Beiträge 121 Kommentatoren 14 Aufrufe

F fauxliving@lemmy.world

There’s no difference in server load between a user looking at a page and a user using an AI tool to summarize the page.

The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It’s the same fucking thing.

You either didn’t read the article or are deliberately making bad faith arguments. The entire point of the article is that the traffic that they’re referring to is initiated by a user, just like when you type an address into your browser’s address bar.

This traffic, initiated by a user, creates the same server load as that same user loading the page in a browser.

Yes, mass scraping of web pages creates a bunch of server load. This was the case before AI was even a thing.

This situation is like Cloudflare presenting was a captcha in order to load each individual image, css or JavaScript asset into a web browser because bot traffic pretends to be a browser.

I don’t think it’s too hard to understand that a bot pretending to be a browser and a human operated browser are two completely different things and classifying them as the same (and captchaing them) would be a classification error.

This is exactly the same kind of error. Even if you personally believe that users using AI tools should be blocked, not everyone has the same opinion. If Cloudflare can’t distinguish between bot requests and human requests then their customers can’t opt out and allow their users to use AI tools even if they want to.
U This user is from outside of this forum
U This user is from outside of this forum
ubergeek@lemmy.today

schrieb zuletzt editiert von

#171

There’s no difference in server load between a user looking at a page and a user using an AI tool to summarize the page.

There is, in scale.
1 Antwort Letzte Antwort

0
D drmoose@lemmy.world

It's insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.

Cloudflare is the biggest cancer on the web, fucking burn it.
D This user is from outside of this forum
D This user is from outside of this forum
dremor@lemmy.world

schrieb zuletzt editiert von dremor@lemmy.world

#172

Linux and Firefox here. No problem at all with Cloudflare, despite having more or less as much privacy preserving add-on as possible. I even spoof my user agent to the latest Firefox ESR on Linux.

Something's may be wrong with your setup.
D C 2 Antworten Letzte Antwort

20
A amberskin@europe.pub

Uh, are they admitting they are trying to circumvent technological protections setup to restrict access to a system?

Isn’t that a literal computer crime?
U This user is from outside of this forum
U This user is from outside of this forum
utopiah@lemmy.world

schrieb zuletzt editiert von

#173

puts on evil hat CloudFlare should DRM their protection then DMCA Perplexity and other US based "AI" companies to oblivion. Side effect, might break the Internet.
D I 2 Antworten Letzte Antwort

13
D dodos@lemmy.world

I'm on Linux with Firefox and have never had that issue before (particularly nexusmods which I use regularly). Something else is probably wrong with your setup.
D This user is from outside of this forum
D This user is from outside of this forum
drmoose@lemmy.world

schrieb zuletzt editiert von

#174

"Wrong with my setup" - thats not how internet works.

I'm based in south east asia and often work on the road so IP rating probably is the final crutch in my fingerprint score.

Either way this should be no way acceptible.
J 1 Antwort Letzte Antwort

1
I iphtashufitz@lemmy.world

I hate to break it to you but not only does Cloudflare do this sort of thing, but so does Akamai, AWS, and virtually every other CDN provider out there. And far from being awful, it’s actually protecting the web.

We use Akamai where I work, and they inform us in real time when a request comes from a bot, and they further classify it as one of a dozen or so bots (search engine crawlers, analytics bots, advertising bots, social networks, AI bots, etc). It also informs us if it’s somebody impersonating a well known bot like Google, etc. So we can easily allow search engines to crawl our site while blocking AI bots, bots impersonating Google, and so on.
P This user is from outside of this forum
P This user is from outside of this forum
poopkins@lemmy.world

schrieb zuletzt editiert von

#175

What I meant with "things like this are awful for the web," I meant that automation through AI is awful for the web. It takes away from the original content creators without any attribution and hits their bottom line.

My story was supposed to be one about responsible AI, but somehow I screwed that up in my summary.
1 Antwort Letzte Antwort

3
D dremor@lemmy.world

Linux and Firefox here. No problem at all with Cloudflare, despite having more or less as much privacy preserving add-on as possible. I even spoof my user agent to the latest Firefox ESR on Linux.

Something's may be wrong with your setup.
D This user is from outside of this forum
D This user is from outside of this forum
drmoose@lemmy.world

schrieb zuletzt editiert von

#176

Thats not how it works. Cf uses thousands of variables to estimate a trust score and block people so just because it works for you doesn't mean it works.
D 1 Antwort Letzte Antwort

4
U utopiah@lemmy.world

puts on evil hat CloudFlare should DRM their protection then DMCA Perplexity and other US based "AI" companies to oblivion. Side effect, might break the Internet.
D This user is from outside of this forum
D This user is from outside of this forum
deflated0ne@lemmy.world

schrieb zuletzt editiert von

#177

Worth it.
1 Antwort Letzte Antwort

3
D davriellelouna@lemmy.world

This post did not contain any content.
B This user is from outside of this forum
B This user is from outside of this forum
badlytimedluck@lemmy.world

schrieb zuletzt editiert von

#178

I hope this isn't too harsh: but I hope their bots fail and the company loses funding from all investors because it's such a big failure
1 Antwort Letzte Antwort

0
D dodos@lemmy.world

I'm on Linux with Firefox and have never had that issue before (particularly nexusmods which I use regularly). Something else is probably wrong with your setup.
J This user is from outside of this forum
J This user is from outside of this forum
jaemo@sh.itjust.works

schrieb zuletzt editiert von

#179

Thirded. All three (Linux, FF, nexus)

ZERO ISSUES.
1 Antwort Letzte Antwort

5
D drmoose@lemmy.world

Thats not how it works. Cf uses thousands of variables to estimate a trust score and block people so just because it works for you doesn't mean it works.
D This user is from outside of this forum
D This user is from outside of this forum
dremor@lemmy.world

schrieb zuletzt editiert von dremor@lemmy.world

#180

Same goes the other way. It's not because it doesn't work for you that it should go away.

That technology has its uses, and Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.

The decision is for the website owner to take, taking into consideration the advantages of filtering out a majority of bots and the disadvantages of loosing some legitimate traffic because of false positives. If you get Cloudflare challenge, chances are that he chosed that the former vastly outclass the later.

Now there are some self-hosted alternatives, like Anubis, but business clients prefer SaaS like Cloudflare to having to maintain their own software. Once again it is their choices and liberty to do so.
D 1 Antwort Letzte Antwort

4
D davriellelouna@lemmy.world

This post did not contain any content.
E This user is from outside of this forum
E This user is from outside of this forum
electricd@lemmybefree.net

schrieb zuletzt editiert von

#181

I don't like cloudflare but it's nice that they allow people to stop AI scrapping if they want to
T 1 Antwort Letzte Antwort

27
D davriellelouna@lemmy.world

This post did not contain any content.
E This user is from outside of this forum
E This user is from outside of this forum
electricd@lemmybefree.net

schrieb zuletzt editiert von electricd@lemmybefree.net

#182

They do have a point though. It would be great to let per-prompt searches go through, but not mass scrapping

I believe a lot of websites don't want both though
T 1 Antwort Letzte Antwort

3
U ubergeek@lemmy.today

And I'm assuming if the robots.txt state their UserAgent isn't allowed to crawl, it obeys it, right?
K This user is from outside of this forum
K This user is from outside of this forum
kissaki@feddit.org

schrieb zuletzt editiert von

#183

No, as per the article, their argumentation is that they are not web crawlers generating an index, they are user-action-triggered agents working live for the user.
U 1 Antwort Letzte Antwort

1
D davriellelouna@lemmy.world

This post did not contain any content.
J This user is from outside of this forum
J This user is from outside of this forum
jimmycrackcrack@lemmy.ml

schrieb zuletzt editiert von

#184

Gee that's a real removed it ain't it perplexity?
1 Antwort Letzte Antwort

4
B brunbrun6766@lemmy.world

Step 1, SOMEHOW find a more punchable face than Altman
S This user is from outside of this forum
S This user is from outside of this forum
scarabic@lemmy.world

schrieb zuletzt editiert von

#185

Altman’s face looks like it’s already been punched
1 Antwort Letzte Antwort

1
A amberskin@europe.pub

Uh, are they admitting they are trying to circumvent technological protections setup to restrict access to a system?

Isn’t that a literal computer crime?
D This user is from outside of this forum
D This user is from outside of this forum
dinckelman@lemmy.world

schrieb zuletzt editiert von

#186

No-no, see. When an AI-first company does it, it's actually called courageous innovation. Crimes are for poor people
S 1 Antwort Letzte Antwort

32
D dremor@lemmy.world

Same goes the other way. It's not because it doesn't work for you that it should go away.

That technology has its uses, and Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.

The decision is for the website owner to take, taking into consideration the advantages of filtering out a majority of bots and the disadvantages of loosing some legitimate traffic because of false positives. If you get Cloudflare challenge, chances are that he chosed that the former vastly outclass the later.

Now there are some self-hosted alternatives, like Anubis, but business clients prefer SaaS like Cloudflare to having to maintain their own software. Once again it is their choices and liberty to do so.
D This user is from outside of this forum
D This user is from outside of this forum
drmoose@lemmy.world

schrieb zuletzt editiert von

#187

lmao imagine shilling for corporate Cloudflare like this. Also false positive vs false negative are fundamentally not equal.

Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.

The main issue with Cloudflare is that it's mostly bullshit. It does not report any stats to the admins on how many users were rejected or any false positive rates and happily put's everyone under "evil bot" umbrella. So people from low trust score environments like Linux or IPs from poorer countries are under significant disadvantage and left without a voice.

I'm literally a security dev working with Cloudflare anti-bot myself (not by choice). It's a useful tool for corporate but a really fucking bad one for the health of the web, much worse than any LLM agent or crawler, period.
L D 2 Antworten Letzte Antwort

3
K kissaki@feddit.org

No, as per the article, their argumentation is that they are not web crawlers generating an index, they are user-action-triggered agents working live for the user.
U This user is from outside of this forum
U This user is from outside of this forum
ubergeek@lemmy.today

schrieb zuletzt editiert von

#188

Except, it's not a live user hitting 10 sights all the same time, trying to crawl the entire site... Live users cannot do that.

That said, if my robots.txt forbids them from hitting my site, as a proxy, they obey that, right?
1 Antwort Letzte Antwort

1
T tollana1234567@lemmy.today

put META android zuckerberg on or mechahitler musk.
U This user is from outside of this forum
U This user is from outside of this forum
umbrella@lemmy.ml

schrieb zuletzt editiert von

#189

they are busy sucking orange fascist balls.
1 Antwort Letzte Antwort

1
E electricd@lemmybefree.net

They do have a point though. It would be great to let per-prompt searches go through, but not mass scrapping

I believe a lot of websites don't want both though
T This user is from outside of this forum
T This user is from outside of this forum
threeganzi@sh.itjust.works

schrieb zuletzt editiert von

#190

Does it not need to be scraped to be indexed, assuming it’s semi-typical RAG stuff?
E 1 Antwort Letzte Antwort

1

Anmelden zum Antworten

D

Ecosia teaches how to activate ads for Ublock, but not for Ublock Origin.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

1

40 Stimmen

3 Beiträge

7 Aufrufe

G

Anecdotal, but every company I have worked for has banned using ad blockers, I’m guessing the risk from ads is lower than allowing extensions to read and modify all of your webpage data? I use them on all of my personal devices.
3

Intel collapsing?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
36

1

151 Stimmen

36 Beiträge

125 Aufrufe

3

Whilst true, AMD are doing just fine by being fabless. I can't really see x86 going as soon as you say for many reasons
R

Adding Text to Your Ebitengine Game
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

0 Stimmen

1 Beiträge

8 Aufrufe

Niemand hat geantwortet
C

So, Linus Torvalds is a jerk
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

0 Stimmen

3 Beiträge

19 Aufrufe

D

https://github.com/corollari/linusrants?tab=readme-ov-file#linusrants
D

Dubai to debut restaurant operated by an AI chef
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
6

26 Stimmen

6 Beiträge

77 Aufrufe

G

Huh, looks like my days of having absolutely zero interest in going to Dubai are coming to a middle
I

Oracle Inks Cloud Deal Worth $30 Billion a Year
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

13 Stimmen

2 Beiträge

41 Aufrufe

J

And it mentioned nothing...
P

The Prime Reasons to Avoid Amazon
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
88

1

396 Stimmen

88 Beiträge

4k Aufrufe

X

Yeah, not a choice any of us who work in tech can make. But the small choices we CAN make do add up significantly.
P

[JS Required] MiniMax M1 model claims Chinese LLM crown from DeepSeek - plus it's true open-source
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
13

1

65 Stimmen

13 Beiträge

133 Aufrufe

S

You want abliterated models, not distilled.