linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

Technology

223 Beiträge 121 Kommentatoren 14 Aufrufe

D dodos@lemmy.world

I'm on Linux with Firefox and have never had that issue before (particularly nexusmods which I use regularly). Something else is probably wrong with your setup.
D This user is from outside of this forum
D This user is from outside of this forum
drmoose@lemmy.world

schrieb zuletzt editiert von

#174

"Wrong with my setup" - thats not how internet works.

I'm based in south east asia and often work on the road so IP rating probably is the final crutch in my fingerprint score.

Either way this should be no way acceptible.
J 1 Antwort Letzte Antwort

1
I iphtashufitz@lemmy.world

I hate to break it to you but not only does Cloudflare do this sort of thing, but so does Akamai, AWS, and virtually every other CDN provider out there. And far from being awful, it’s actually protecting the web.

We use Akamai where I work, and they inform us in real time when a request comes from a bot, and they further classify it as one of a dozen or so bots (search engine crawlers, analytics bots, advertising bots, social networks, AI bots, etc). It also informs us if it’s somebody impersonating a well known bot like Google, etc. So we can easily allow search engines to crawl our site while blocking AI bots, bots impersonating Google, and so on.
P This user is from outside of this forum
P This user is from outside of this forum
poopkins@lemmy.world

schrieb zuletzt editiert von

#175

What I meant with "things like this are awful for the web," I meant that automation through AI is awful for the web. It takes away from the original content creators without any attribution and hits their bottom line.

My story was supposed to be one about responsible AI, but somehow I screwed that up in my summary.
1 Antwort Letzte Antwort

3
D dremor@lemmy.world

Linux and Firefox here. No problem at all with Cloudflare, despite having more or less as much privacy preserving add-on as possible. I even spoof my user agent to the latest Firefox ESR on Linux.

Something's may be wrong with your setup.
D This user is from outside of this forum
D This user is from outside of this forum
drmoose@lemmy.world

schrieb zuletzt editiert von

#176

Thats not how it works. Cf uses thousands of variables to estimate a trust score and block people so just because it works for you doesn't mean it works.
D 1 Antwort Letzte Antwort

4
U utopiah@lemmy.world

puts on evil hat CloudFlare should DRM their protection then DMCA Perplexity and other US based "AI" companies to oblivion. Side effect, might break the Internet.
D This user is from outside of this forum
D This user is from outside of this forum
deflated0ne@lemmy.world

schrieb zuletzt editiert von

#177

Worth it.
1 Antwort Letzte Antwort

3
D davriellelouna@lemmy.world

This post did not contain any content.
B This user is from outside of this forum
B This user is from outside of this forum
badlytimedluck@lemmy.world

schrieb zuletzt editiert von

#178

I hope this isn't too harsh: but I hope their bots fail and the company loses funding from all investors because it's such a big failure
1 Antwort Letzte Antwort

0
D dodos@lemmy.world

I'm on Linux with Firefox and have never had that issue before (particularly nexusmods which I use regularly). Something else is probably wrong with your setup.
J This user is from outside of this forum
J This user is from outside of this forum
jaemo@sh.itjust.works

schrieb zuletzt editiert von

#179

Thirded. All three (Linux, FF, nexus)

ZERO ISSUES.
1 Antwort Letzte Antwort

5
D drmoose@lemmy.world

Thats not how it works. Cf uses thousands of variables to estimate a trust score and block people so just because it works for you doesn't mean it works.
D This user is from outside of this forum
D This user is from outside of this forum
dremor@lemmy.world

schrieb zuletzt editiert von dremor@lemmy.world

#180

Same goes the other way. It's not because it doesn't work for you that it should go away.

That technology has its uses, and Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.

The decision is for the website owner to take, taking into consideration the advantages of filtering out a majority of bots and the disadvantages of loosing some legitimate traffic because of false positives. If you get Cloudflare challenge, chances are that he chosed that the former vastly outclass the later.

Now there are some self-hosted alternatives, like Anubis, but business clients prefer SaaS like Cloudflare to having to maintain their own software. Once again it is their choices and liberty to do so.
D 1 Antwort Letzte Antwort

4
D davriellelouna@lemmy.world

This post did not contain any content.
E This user is from outside of this forum
E This user is from outside of this forum
electricd@lemmybefree.net

schrieb zuletzt editiert von

#181

I don't like cloudflare but it's nice that they allow people to stop AI scrapping if they want to
T 1 Antwort Letzte Antwort

27
D davriellelouna@lemmy.world

This post did not contain any content.
E This user is from outside of this forum
E This user is from outside of this forum
electricd@lemmybefree.net

schrieb zuletzt editiert von electricd@lemmybefree.net

#182

They do have a point though. It would be great to let per-prompt searches go through, but not mass scrapping

I believe a lot of websites don't want both though
T 1 Antwort Letzte Antwort

3
U ubergeek@lemmy.today

And I'm assuming if the robots.txt state their UserAgent isn't allowed to crawl, it obeys it, right?
K This user is from outside of this forum
K This user is from outside of this forum
kissaki@feddit.org

schrieb zuletzt editiert von

#183

No, as per the article, their argumentation is that they are not web crawlers generating an index, they are user-action-triggered agents working live for the user.
U 1 Antwort Letzte Antwort

1
D davriellelouna@lemmy.world

This post did not contain any content.
J This user is from outside of this forum
J This user is from outside of this forum
jimmycrackcrack@lemmy.ml

schrieb zuletzt editiert von

#184

Gee that's a real removed it ain't it perplexity?
1 Antwort Letzte Antwort

4
B brunbrun6766@lemmy.world

Step 1, SOMEHOW find a more punchable face than Altman
S This user is from outside of this forum
S This user is from outside of this forum
scarabic@lemmy.world

schrieb zuletzt editiert von

#185

Altman’s face looks like it’s already been punched
1 Antwort Letzte Antwort

1
A amberskin@europe.pub

Uh, are they admitting they are trying to circumvent technological protections setup to restrict access to a system?

Isn’t that a literal computer crime?
D This user is from outside of this forum
D This user is from outside of this forum
dinckelman@lemmy.world

schrieb zuletzt editiert von

#186

No-no, see. When an AI-first company does it, it's actually called courageous innovation. Crimes are for poor people
S 1 Antwort Letzte Antwort

32
D dremor@lemmy.world

Same goes the other way. It's not because it doesn't work for you that it should go away.

That technology has its uses, and Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.

The decision is for the website owner to take, taking into consideration the advantages of filtering out a majority of bots and the disadvantages of loosing some legitimate traffic because of false positives. If you get Cloudflare challenge, chances are that he chosed that the former vastly outclass the later.

Now there are some self-hosted alternatives, like Anubis, but business clients prefer SaaS like Cloudflare to having to maintain their own software. Once again it is their choices and liberty to do so.
D This user is from outside of this forum
D This user is from outside of this forum
drmoose@lemmy.world

schrieb zuletzt editiert von

#187

lmao imagine shilling for corporate Cloudflare like this. Also false positive vs false negative are fundamentally not equal.

Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.

The main issue with Cloudflare is that it's mostly bullshit. It does not report any stats to the admins on how many users were rejected or any false positive rates and happily put's everyone under "evil bot" umbrella. So people from low trust score environments like Linux or IPs from poorer countries are under significant disadvantage and left without a voice.

I'm literally a security dev working with Cloudflare anti-bot myself (not by choice). It's a useful tool for corporate but a really fucking bad one for the health of the web, much worse than any LLM agent or crawler, period.
L D 2 Antworten Letzte Antwort

3
K kissaki@feddit.org

No, as per the article, their argumentation is that they are not web crawlers generating an index, they are user-action-triggered agents working live for the user.
U This user is from outside of this forum
U This user is from outside of this forum
ubergeek@lemmy.today

schrieb zuletzt editiert von

#188

Except, it's not a live user hitting 10 sights all the same time, trying to crawl the entire site... Live users cannot do that.

That said, if my robots.txt forbids them from hitting my site, as a proxy, they obey that, right?
1 Antwort Letzte Antwort

1
T tollana1234567@lemmy.today

put META android zuckerberg on or mechahitler musk.
U This user is from outside of this forum
U This user is from outside of this forum
umbrella@lemmy.ml

schrieb zuletzt editiert von

#189

they are busy sucking orange fascist balls.
1 Antwort Letzte Antwort

1
E electricd@lemmybefree.net

They do have a point though. It would be great to let per-prompt searches go through, but not mass scrapping

I believe a lot of websites don't want both though
T This user is from outside of this forum
T This user is from outside of this forum
threeganzi@sh.itjust.works

schrieb zuletzt editiert von

#190

Does it not need to be scraped to be indexed, assuming it’s semi-typical RAG stuff?
E 1 Antwort Letzte Antwort

1
E electricd@lemmybefree.net

I don't like cloudflare but it's nice that they allow people to stop AI scrapping if they want to
T This user is from outside of this forum
T This user is from outside of this forum
tempest@lemmy.ca

schrieb zuletzt editiert von

#191

CloudFlare has become an Internet protection racket and I'm not happy about it.
L E 2 Antworten Letzte Antwort

17
D drmoose@lemmy.world

Cloudflare actually fully fingerprints your browser and even sells that data. Thats your IP, TLS, operating system, full browser environment, installed extensions, GPU capabilities etc. It's all tracked before the box even shows up, in fact the box is there to give the runtime more time to fingerprint you.
T This user is from outside of this forum
T This user is from outside of this forum
tempest@lemmy.ca

schrieb zuletzt editiert von

#192

Yeah and the worst part is it doesn't fucking work for the one thing it's supposed to do.

The only thing it does is stop the stupidest low effort scrapers and forces the good ones to use a browser.
1 Antwort Letzte Antwort

4
D drmoose@lemmy.world

"Wrong with my setup" - thats not how internet works.

I'm based in south east asia and often work on the road so IP rating probably is the final crutch in my fingerprint score.

Either way this should be no way acceptible.
J This user is from outside of this forum
J This user is from outside of this forum
jcbazpx@lemmy.world

schrieb zuletzt editiert von

#193

That is exactly how the internet works. That's always how the internet has worked.
1 Antwort Letzte Antwort

1

Anmelden zum Antworten

P

"I support it only if it's open source" should be a more common viewpoint
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
50

473 Stimmen

50 Beiträge

110 Aufrufe

J

No, they don't say they will sue (they flat out can't), but they say they will cut off your access to any updates. Now one could (and I would) argue that sounds like a restriction on exercising your open source rights. However the counter argument seems to be those protections apply only to software acquired to date, and if you deny access to future binaries you can deny access to those sources. In any event, all this subtlety around the licensing aside, it's just a bigger hassle to use RedHat versus pretty much any other distribution, precisely because they kind of want IBM/Oracle style entitlement management where the user gets to have to do all the management work to look after their suppliers business needs.
G

Oh My God, TAKE IT DOWN Kills Parody
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
24

1

115 Stimmen

24 Beiträge

350 Aufrufe

P

Rules for thee...
D

Scientists reportedly hiding AI text prompts in academic papers to receive positive peer reviews
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
30

1

242 Stimmen

30 Beiträge

452 Aufrufe

X

They didn't ask what the comic was, they asked "but why not both?". It can be both unethical and a lesson
P

Anubis, The Opensource Defender Against AI Bots: I fight bots in my free time
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
17

1

346 Stimmen

17 Beiträge

198 Aufrufe

L

Great interview! The whole proof-of-work approach is fascinating, and reminds me of a very old email concept he mentions in passing, where an email server would only accept a msg if the sender agreed to pay like a dollar. Then the user would accept the msg, which would refund the dollar. So this would end up costing legitimate senders nothing but would require spammers to front way too much money to make email spamming affordable. In his version the sender must do a processor-intensive computation, which is fine at the volume legitimate senders use but prohibitive for spammers.
P

How Do I Prepare My Phone for a Protest?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
139

1

505 Stimmen

139 Beiträge

4k Aufrufe

D

So first, even here we see foundation money and big tech, not government. Facebook, Google, etc mostly love net neutrality, tolerate encryption, anf see utility in anonymous internet access, mostly because these things don't interfere with their core advertising businesses, and generally have helped them. I didn't see Comcast and others in the ISP oligopoly on that list, probably because they would not benefit from net neutrality, encryption, and privacy for obvious reasons. The EFF advocates for particular civil libertarian policies, always has. That does attract certain donors, but not others. They have plenty of diverse and grassroots support too. One day they may have to choose between their corpo donors and their values, but I have yet to see them abandon principles.
D

AI company files for bankruptcy after being exposed as 700 Indian engineers - Dexerto
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
133

1

845 Stimmen

133 Beiträge

4k Aufrufe

A

reminds me of the time when something with Amazon was Indian employees
H

Autonomes Fahren: Lidar kann Smartphone-Kameras schwer beschädigen
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
6

1

5 Stimmen

6 Beiträge

69 Aufrufe

B

Oh sorry, my mind must have been a bit foggy when I read that. We agree 100%
P

Groups of AI agents spontaneously form their own social norms without human help
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
5

1

1 Stimmen

5 Beiträge

59 Aufrufe

A

Turns out dry sarcasm doesn't come across well in text form, if only there was a way to indicate it