linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

223 Beiträge 121 Kommentatoren 14 Aufrufe

U ubergeek@lemmy.today

And I'm assuming if the robots.txt state their UserAgent isn't allowed to crawl, it obeys it, right?
K This user is from outside of this forum
K This user is from outside of this forum
kissaki@feddit.org

schrieb zuletzt editiert von

#183

No, as per the article, their argumentation is that they are not web crawlers generating an index, they are user-action-triggered agents working live for the user.
U 1 Antwort Letzte Antwort

1
D davriellelouna@lemmy.world

This post did not contain any content.
J This user is from outside of this forum
J This user is from outside of this forum
jimmycrackcrack@lemmy.ml

schrieb zuletzt editiert von

#184

Gee that's a real removed it ain't it perplexity?
1 Antwort Letzte Antwort

4
B brunbrun6766@lemmy.world

Step 1, SOMEHOW find a more punchable face than Altman
S This user is from outside of this forum
S This user is from outside of this forum
scarabic@lemmy.world

schrieb zuletzt editiert von

#185

Altman’s face looks like it’s already been punched
1 Antwort Letzte Antwort

1
A amberskin@europe.pub

Uh, are they admitting they are trying to circumvent technological protections setup to restrict access to a system?

Isn’t that a literal computer crime?
D This user is from outside of this forum
D This user is from outside of this forum
dinckelman@lemmy.world

schrieb zuletzt editiert von

#186

No-no, see. When an AI-first company does it, it's actually called courageous innovation. Crimes are for poor people
S 1 Antwort Letzte Antwort

31
D dremor@lemmy.world

Same goes the other way. It's not because it doesn't work for you that it should go away.

That technology has its uses, and Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.

The decision is for the website owner to take, taking into consideration the advantages of filtering out a majority of bots and the disadvantages of loosing some legitimate traffic because of false positives. If you get Cloudflare challenge, chances are that he chosed that the former vastly outclass the later.

Now there are some self-hosted alternatives, like Anubis, but business clients prefer SaaS like Cloudflare to having to maintain their own software. Once again it is their choices and liberty to do so.
D This user is from outside of this forum
D This user is from outside of this forum
drmoose@lemmy.world

schrieb zuletzt editiert von

#187

lmao imagine shilling for corporate Cloudflare like this. Also false positive vs false negative are fundamentally not equal.

Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.

The main issue with Cloudflare is that it's mostly bullshit. It does not report any stats to the admins on how many users were rejected or any false positive rates and happily put's everyone under "evil bot" umbrella. So people from low trust score environments like Linux or IPs from poorer countries are under significant disadvantage and left without a voice.

I'm literally a security dev working with Cloudflare anti-bot myself (not by choice). It's a useful tool for corporate but a really fucking bad one for the health of the web, much worse than any LLM agent or crawler, period.
L D 2 Antworten Letzte Antwort

3
K kissaki@feddit.org

No, as per the article, their argumentation is that they are not web crawlers generating an index, they are user-action-triggered agents working live for the user.
U This user is from outside of this forum
U This user is from outside of this forum
ubergeek@lemmy.today

schrieb zuletzt editiert von

#188

Except, it's not a live user hitting 10 sights all the same time, trying to crawl the entire site... Live users cannot do that.

That said, if my robots.txt forbids them from hitting my site, as a proxy, they obey that, right?
1 Antwort Letzte Antwort

1
T tollana1234567@lemmy.today

put META android zuckerberg on or mechahitler musk.
U This user is from outside of this forum
U This user is from outside of this forum
umbrella@lemmy.ml

schrieb zuletzt editiert von

#189

they are busy sucking orange fascist balls.
1 Antwort Letzte Antwort

1
E electricd@lemmybefree.net

They do have a point though. It would be great to let per-prompt searches go through, but not mass scrapping

I believe a lot of websites don't want both though
T This user is from outside of this forum
T This user is from outside of this forum
threeganzi@sh.itjust.works

schrieb zuletzt editiert von

#190

Does it not need to be scraped to be indexed, assuming it’s semi-typical RAG stuff?
E 1 Antwort Letzte Antwort

1
E electricd@lemmybefree.net

I don't like cloudflare but it's nice that they allow people to stop AI scrapping if they want to
T This user is from outside of this forum
T This user is from outside of this forum
tempest@lemmy.ca

schrieb zuletzt editiert von

#191

CloudFlare has become an Internet protection racket and I'm not happy about it.
L E 2 Antworten Letzte Antwort

17
D drmoose@lemmy.world

Cloudflare actually fully fingerprints your browser and even sells that data. Thats your IP, TLS, operating system, full browser environment, installed extensions, GPU capabilities etc. It's all tracked before the box even shows up, in fact the box is there to give the runtime more time to fingerprint you.
T This user is from outside of this forum
T This user is from outside of this forum
tempest@lemmy.ca

schrieb zuletzt editiert von

#192

Yeah and the worst part is it doesn't fucking work for the one thing it's supposed to do.

The only thing it does is stop the stupidest low effort scrapers and forces the good ones to use a browser.
1 Antwort Letzte Antwort

4
D drmoose@lemmy.world

"Wrong with my setup" - thats not how internet works.

I'm based in south east asia and often work on the road so IP rating probably is the final crutch in my fingerprint score.

Either way this should be no way acceptible.
J This user is from outside of this forum
J This user is from outside of this forum
jcbazpx@lemmy.world

schrieb zuletzt editiert von

#193

That is exactly how the internet works. That's always how the internet has worked.
1 Antwort Letzte Antwort

1
D dinckelman@lemmy.world

No-no, see. When an AI-first company does it, it's actually called courageous innovation. Crimes are for poor people
S This user is from outside of this forum
S This user is from outside of this forum
silicon@lemmy.world

schrieb zuletzt editiert von

#194

See: Facebook/Meta
1 Antwort Letzte Antwort

6
U ubergeek@lemmy.today

Centralization, mostly, but also their hands-off approach to most fascist content.
J This user is from outside of this forum
J This user is from outside of this forum
jcbazpx@lemmy.world

schrieb zuletzt editiert von

#195

They kind of have to be hands off or risk losing safe harbor protections.
1 Antwort Letzte Antwort

2
D dremor@lemmy.world

Linux and Firefox here. No problem at all with Cloudflare, despite having more or less as much privacy preserving add-on as possible. I even spoof my user agent to the latest Firefox ESR on Linux.

Something's may be wrong with your setup.
C This user is from outside of this forum
C This user is from outside of this forum
coaster1921@lemmy.ml

schrieb zuletzt editiert von

#196

I suspect a lot of it comes down to your ISP. Like the original commentor I also frequently can't pass CloudFlare turnstile when on Wifi, although refreshing the page a few times usually gets me through. Worst case on my phone's hotspot I can much more consistently pass. It's super annoying and combined with their recent DNS outage has totally ruined any respect I had for CloudFlare.

Interesting video on the subject: https://youtu.be/SasXJwyKkMI
1 Antwort Letzte Antwort

2
U ubergeek@lemmy.today

How "open" a website is, is up to the owner, and that's all. Unless we're talking about de-privatizing the internet as a whole, here.
T This user is from outside of this forum
T This user is from outside of this forum
tomalley8342@lemmy.world

schrieb zuletzt editiert von

#197

How “open” a website is, is up to the owner, and that’s all.

As someone who registered this account on this platform in response to Reddit's API restrictions, it would be hypocritical of me to accept such a belief.
U 1 Antwort Letzte Antwort

0
T tempest@lemmy.ca

CloudFlare has become an Internet protection racket and I'm not happy about it.
L This user is from outside of this forum
L This user is from outside of this forum
laser@feddit.org

schrieb zuletzt editiert von

#198

It's been this from the very beginning. But they don't fit the definition of a protection racket as they're not the ones attacking you if you don't pay up. So they're more like a security company that has no competitors due to the needed investment to operate.
A 1 Antwort Letzte Antwort

12
D drmoose@lemmy.world

lmao imagine shilling for corporate Cloudflare like this. Also false positive vs false negative are fundamentally not equal.

Cloudflare is probably aware that there are still some false positive, and probably is working on it as we write.

The main issue with Cloudflare is that it's mostly bullshit. It does not report any stats to the admins on how many users were rejected or any false positive rates and happily put's everyone under "evil bot" umbrella. So people from low trust score environments like Linux or IPs from poorer countries are under significant disadvantage and left without a voice.

I'm literally a security dev working with Cloudflare anti-bot myself (not by choice). It's a useful tool for corporate but a really fucking bad one for the health of the web, much worse than any LLM agent or crawler, period.
L This user is from outside of this forum
L This user is from outside of this forum
laser@feddit.org

schrieb zuletzt editiert von

#199

So people from low trust score environments like Linux

Linux user here, Cloudflare hasn't blocked access to a single page for me unless I use a VPN, which then can trigger it.
1 Antwort Letzte Antwort

1
T tomalley8342@lemmy.world

How “open” a website is, is up to the owner, and that’s all.

As someone who registered this account on this platform in response to Reddit's API restrictions, it would be hypocritical of me to accept such a belief.
U This user is from outside of this forum
U This user is from outside of this forum
ubergeek@lemmy.today

schrieb zuletzt editiert von

#200

Well, until we abolish capitalism, that's the state of things. Unless you feel like Nazis MUST be freely given access to everything too?
T 1 Antwort Letzte Antwort

0
D drmoose@lemmy.world

It's insane that anyone would side with Cloudflare here. To this day I cant visit many websites like nexusmods just because I run Firefox on Linux. The Cloudflare turnstile just refreshes infinitely and has been for months now.

Cloudflare is the biggest cancer on the web, fucking burn it.
C This user is from outside of this forum
C This user is from outside of this forum
catdogl0ver@lemmy.world

schrieb zuletzt editiert von

#201

It happened to me before until I did a Google search. It was my VPN web protection. It was too " over protective".

Check your security settings, antivirus and VPN
1 Antwort Letzte Antwort

0
I int32@lemmy.dbzer0.com

They can use web.archive.org as a cdn(I do that to cloudflare websites). But honestly, cloudflare or not, the internet is broken.
T This user is from outside of this forum
T This user is from outside of this forum
turmoil@feddit.org

schrieb zuletzt editiert von

#202

Using archive.org as a CDN at the scale of Cloudflare would be an immediate death sentence for archive.org.
I 1 Antwort Letzte Antwort

1

Anmelden zum Antworten

Y

Taylor Swift’s new album comes in cassette. Who is buying those?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
78

83 Stimmen

78 Beiträge

1 Aufrufe

T

I've seen various artists selling those on bandcamp, and they're often sold out
D

Adobe was sued today in another class action about its subscription practices. Plaintiffs claim Adobe intentionally lied to consumers and charged them expensive early termination fees.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
22

326 Stimmen

22 Beiträge

49 Aufrufe

I

In Gimp it was the enhancement to the command search. It needs to find a command when you type a slash. Before it would only execute the command. Now it tells you where it is. So you don't need to search every time. In Inkscape there have been several. Most recently it was to reduce the width of the Text panel by moving some elements. As the Text panel is very wide. A full overhaul is due soon.
K

After BlackSuit is taken down, new ransomware group Chaos emerges
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

30 Stimmen

1 Beiträge

12 Aufrufe

Niemand hat geantwortet
F

Scientists discover a materials maze that prevents bacterial infections
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

1

162 Stimmen

7 Beiträge

63 Aufrufe

L

I wonder if they could develop this into a tooth coating. Preventing biofilms would go a long way to preventing cavities.
P

The Department of Defense Efforts to Buy and Maintain IT Systems Are Billions Over Budget and Delayed
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
13

1

216 Stimmen

13 Beiträge

128 Aufrufe

J

It’s DEI’s fault!
P

IRS tax filing software released to the people as free software
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
14

287 Stimmen

14 Beiträge

120 Aufrufe

P

Only if you're a scumbag/useful idiot.
R

Elon Musk’s Neuralink raises fresh cash at $9B valuation
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
15

1

12 Stimmen

15 Beiträge

168 Aufrufe

B

I'd rather die than let Elon Musk put shit in my brain.
P

Google Shared My Phone Number!
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
45

1

145 Stimmen

45 Beiträge

597 Aufrufe

M

Italy, and all of Europe, have always had a greater respect for personal and a lesser respect for business' profits than the U.S.

1
2
8
9
10
11
12