linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

Technology

223 Beiträge 121 Kommentatoren 14 Aufrufe

K kissaki@feddit.org

Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.

So, I assume Perplexity uses appropriate identifiable user-agent headers, to allow hosters to decide whether to serve them one way or another?
D This user is from outside of this forum
D This user is from outside of this forum
drmoose@lemmy.world

schrieb zuletzt editiert von

#160

Its not up to the hoster to decide whom to serve content. Web is intended to be user agent agnostic.
1 Antwort Letzte Antwort

2
D drmoose@lemmy.world

It's insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.

Cloudflare is the biggest cancer on the web, fucking burn it.
B This user is from outside of this forum
B This user is from outside of this forum
baronofclubs@lemmy.world

schrieb zuletzt editiert von

#161

omg ur a hacker

Did you mean Edge on Windows? 'Cause if so, welcome in!
1 Antwort Letzte Antwort

1
D drmoose@lemmy.world

It's insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.

Cloudflare is the biggest cancer on the web, fucking burn it.
D This user is from outside of this forum
D This user is from outside of this forum
dodos@lemmy.world

schrieb zuletzt editiert von

#162

I'm on Linux with Firefox and have never had that issue before (particularly nexusmods which I use regularly). Something else is probably wrong with your setup.
Y D J 3 Antworten Letzte Antwort

13
K kissaki@feddit.org

Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.

So, I assume Perplexity uses appropriate identifiable user-agent headers, to allow hosters to decide whether to serve them one way or another?
U This user is from outside of this forum
U This user is from outside of this forum
ubergeek@lemmy.today

schrieb zuletzt editiert von

#163

And I'm assuming if the robots.txt state their UserAgent isn't allowed to crawl, it obeys it, right?
K 1 Antwort Letzte Antwort

6
O oppy1984@lemdro.id

I'm out of the loop, what's wrong with cloud flare?
U This user is from outside of this forum
U This user is from outside of this forum
ubergeek@lemmy.today

schrieb zuletzt editiert von

#164

Centralization, mostly, but also their hands-off approach to most fascist content.
J 1 Antwort Letzte Antwort

3
T tomalley8342@lemmy.world

DoS attacks are already a crime, so of course the need for some kind of solution is clear. But any proposal that gatekeeps the internet and restricts the freedoms with which the user can interact with it is no solution at all. To me, the openness of the web shouldn't be something that people just consider, or are amenable to. It should be the foundation in which all reasonable proposals should consider as a principle truth.
U This user is from outside of this forum
U This user is from outside of this forum
ubergeek@lemmy.today

schrieb zuletzt editiert von

#165

How "open" a website is, is up to the owner, and that's all. Unless we're talking about de-privatizing the internet as a whole, here.
T 1 Antwort Letzte Antwort

0
D dodos@lemmy.world

I'm on Linux with Firefox and have never had that issue before (particularly nexusmods which I use regularly). Something else is probably wrong with your setup.
Y This user is from outside of this forum
Y This user is from outside of this forum
yeller_king@reddthat.com

schrieb zuletzt editiert von

#166

In my case, it's usually the VPN.
1 Antwort Letzte Antwort

1
K kreskin@lemmy.world

they cant get their ai to check a box that says "I am not a robot"? I'd think thatd be a first year comp sci student level task. And robots.txt files were basically always voluntary compliance anyway.
5 This user is from outside of this forum
5 This user is from outside of this forum
5gruel@lemmy.world

schrieb zuletzt editiert von

#167

Recaptcha v2 does way more than check if the box was checked.

How does Google reCAPTCHA v2 work behind the scenes?

This post refers to Google ReCaptcha v2 (not the latest version) Recently Google introduced a simplified "captcha" verification system (video) that enables users to pass the "captcha" just by clic...

Stack Overflow (stackoverflow.com)
1 Antwort Letzte Antwort

3
K kokesh@lemmy.world

Is there some simply deployable PHP honeytrap for AI crawlers?
U This user is from outside of this forum
U This user is from outside of this forum
ubergeek@lemmy.today

schrieb zuletzt editiert von

#168

You could probably route all requests to your site from them, back at themselves, so they DDoS themselves, and on top off it, cost them more because their endpoint needs to process things via their LLM.
1 Antwort Letzte Antwort

0
R rdri@lemmy.world

First we complain that AI steals and trains on our data. Then we complain when it doesn't train. Cool.
U This user is from outside of this forum
U This user is from outside of this forum
ubergeek@lemmy.today

schrieb zuletzt editiert von

#169

I think it boils down to "consent" and "remuneration".

I run a website, that I do not consent to being accessed for LLMs. However, should LLMs use my content, I should be compensated for such use.

So, these LLM startups ignore both consent, and the idea of remuneration.

Most of these concepts have already been figured out for the purpose of law, if we consider websites much akin to real estate: Then, the typical trespass laws, compensatory usage, and hell, even eminent domain if needed ie, a city government can "take over" the boosted post feature to make sure alerts get pushed as widely and quickly as possible.
R 1 Antwort Letzte Antwort

0
F fauxliving@lemmy.world

The amount of people just reacting to the headline in the comments on these kinds of articles is always surprising.

Your browser acts as an agent too, you don’t manually visit every script link, image source and CSS file. Everyone has experienced how annoying it is to have your browser be targeted by Cloudflare.

There’s a pretty major difference between a human user loading a page and having it summarized and a bot that is scraping 1500 pages/second.

Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation. But a user initiated operation isn’t the same as a bot.

Which is the point of the article and the article’s title.

It isn’t clear why OP had to alter the headline to bait the anti-ai crowd.
U This user is from outside of this forum
U This user is from outside of this forum
ubergeek@lemmy.today

schrieb zuletzt editiert von

#170

Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted

Except, they don't. It's a toggle, available to users, and by default, allows Perplexity's scraping.
1 Antwort Letzte Antwort

0
F fauxliving@lemmy.world

There’s no difference in server load between a user looking at a page and a user using an AI tool to summarize the page.

The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It’s the same fucking thing.

You either didn’t read the article or are deliberately making bad faith arguments. The entire point of the article is that the traffic that they’re referring to is initiated by a user, just like when you type an address into your browser’s address bar.

This traffic, initiated by a user, creates the same server load as that same user loading the page in a browser.

Yes, mass scraping of web pages creates a bunch of server load. This was the case before AI was even a thing.

This situation is like Cloudflare presenting was a captcha in order to load each individual image, css or JavaScript asset into a web browser because bot traffic pretends to be a browser.

I don’t think it’s too hard to understand that a bot pretending to be a browser and a human operated browser are two completely different things and classifying them as the same (and captchaing them) would be a classification error.

This is exactly the same kind of error. Even if you personally believe that users using AI tools should be blocked, not everyone has the same opinion. If Cloudflare can’t distinguish between bot requests and human requests then their customers can’t opt out and allow their users to use AI tools even if they want to.
U This user is from outside of this forum
U This user is from outside of this forum
ubergeek@lemmy.today

schrieb zuletzt editiert von

#171

There’s no difference in server load between a user looking at a page and a user using an AI tool to summarize the page.

There is, in scale.
1 Antwort Letzte Antwort

0
D drmoose@lemmy.world

It's insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.

Cloudflare is the biggest cancer on the web, fucking burn it.
D This user is from outside of this forum
D This user is from outside of this forum
dremor@lemmy.world

schrieb zuletzt editiert von dremor@lemmy.world

#172

Linux and Firefox here. No problem at all with Cloudflare, despite having more or less as much privacy preserving add-on as possible. I even spoof my user agent to the latest Firefox ESR on Linux.

Something's may be wrong with your setup.
D C 2 Antworten Letzte Antwort

20
A amberskin@europe.pub

Uh, are they admitting they are trying to circumvent technological protections setup to restrict access to a system?

Isn’t that a literal computer crime?
U This user is from outside of this forum
U This user is from outside of this forum
utopiah@lemmy.world

schrieb zuletzt editiert von

#173

puts on evil hat CloudFlare should DRM their protection then DMCA Perplexity and other US based "AI" companies to oblivion. Side effect, might break the Internet.
D I 2 Antworten Letzte Antwort

13
D dodos@lemmy.world

I'm on Linux with Firefox and have never had that issue before (particularly nexusmods which I use regularly). Something else is probably wrong with your setup.
D This user is from outside of this forum
D This user is from outside of this forum
drmoose@lemmy.world

schrieb zuletzt editiert von

#174

"Wrong with my setup" - thats not how internet works.

I'm based in south east asia and often work on the road so IP rating probably is the final crutch in my fingerprint score.

Either way this should be no way acceptible.
J 1 Antwort Letzte Antwort

1
I iphtashufitz@lemmy.world

I hate to break it to you but not only does Cloudflare do this sort of thing, but so does Akamai, AWS, and virtually every other CDN provider out there. And far from being awful, it’s actually protecting the web.

We use Akamai where I work, and they inform us in real time when a request comes from a bot, and they further classify it as one of a dozen or so bots (search engine crawlers, analytics bots, advertising bots, social networks, AI bots, etc). It also informs us if it’s somebody impersonating a well known bot like Google, etc. So we can easily allow search engines to crawl our site while blocking AI bots, bots impersonating Google, and so on.
P This user is from outside of this forum
P This user is from outside of this forum
poopkins@lemmy.world

schrieb zuletzt editiert von

#175

What I meant with "things like this are awful for the web," I meant that automation through AI is awful for the web. It takes away from the original content creators without any attribution and hits their bottom line.

My story was supposed to be one about responsible AI, but somehow I screwed that up in my summary.
1 Antwort Letzte Antwort

3
D dremor@lemmy.world

Linux and Firefox here. No problem at all with Cloudflare, despite having more or less as much privacy preserving add-on as possible. I even spoof my user agent to the latest Firefox ESR on Linux.

Something's may be wrong with your setup.
D This user is from outside of this forum
D This user is from outside of this forum
drmoose@lemmy.world

schrieb zuletzt editiert von

#176

Thats not how it works. Cf uses thousands of variables to estimate a trust score and block people so just because it works for you doesn't mean it works.
D 1 Antwort Letzte Antwort

4
U utopiah@lemmy.world

puts on evil hat CloudFlare should DRM their protection then DMCA Perplexity and other US based "AI" companies to oblivion. Side effect, might break the Internet.
D This user is from outside of this forum
D This user is from outside of this forum
deflated0ne@lemmy.world

schrieb zuletzt editiert von

#177

Worth it.
1 Antwort Letzte Antwort

3
D davriellelouna@lemmy.world

This post did not contain any content.
B This user is from outside of this forum
B This user is from outside of this forum
badlytimedluck@lemmy.world

schrieb zuletzt editiert von

#178

I hope this isn't too harsh: but I hope their bots fail and the company loses funding from all investors because it's such a big failure
1 Antwort Letzte Antwort

0
D dodos@lemmy.world

I'm on Linux with Firefox and have never had that issue before (particularly nexusmods which I use regularly). Something else is probably wrong with your setup.
J This user is from outside of this forum
J This user is from outside of this forum
jaemo@sh.itjust.works

schrieb zuletzt editiert von

#179

Thirded. All three (Linux, FF, nexus)

ZERO ISSUES.
1 Antwort Letzte Antwort

5

Anmelden zum Antworten

B

The LLMentalist Effect: how chat-based Large Language Models replicate the mechanisms of a psychic’s con
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

6 Aufrufe

Niemand hat geantwortet
F

Can I build a fully decentralised website?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
14

15 Stimmen

14 Beiträge

14 Aufrufe

S

Microsoft Flight Simulator,... duh /s
K

Using the video queue feature on YouTube mobile requires premium
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

9 Aufrufe

Niemand hat geantwortet
V

Musk’s Starlink hit with hours-long outage after rollout of T-Mobile satellite service
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
57

1

346 Stimmen

57 Beiträge

542 Aufrufe

J

More birds in orbit just hear more and more overlapping signals from the huge ground area they are over, and so share bandwidth. There’s a reason cell towers get lower and lower the more dense the population.
D

Dentist accused of fatally poisoning his wife asked daughter to create AI deepfake of mom asking for chemicals
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
25

1

209 Stimmen

25 Beiträge

760 Aufrufe

R

Either that has happened to more than one people or it was literally me that happened to hah, but on my lemm.ee account (RIP)
P

Data breach reveals Catwatchful ‘stalkerware’ is spying on thousands of phones
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

1

63 Stimmen

2 Beiträge

40 Aufrufe

J

Very clever.
P

Against AI: An Open Letter From Writers to Penguin Random House, HarperCollins, Simon & Schuster, Hachette Book Group, Macmillan, and all other publishers of America
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
34

1

321 Stimmen

34 Beiträge

499 Aufrufe

F

Bro found the block button
M

You probably don't remember these but I have a question
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
52

2

96 Stimmen

52 Beiträge

604 Aufrufe

L

Priorities man, priorities