linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

Technology

117 Beiträge 66 Kommentatoren 0 Aufrufe

D davriellelouna@lemmy.world

This post did not contain any content.
E This user is from outside of this forum
E This user is from outside of this forum
encryptkeeper@lemmy.world

schrieb zuletzt editiert von

#41

I can’t get over their CEO that looks like a nine year old. Not sure what it is about him
D D 2 Antworten Letzte Antwort

4
I interdimensionalmeme@lemmy.ml

Just buy cloudflare duh
_ This user is from outside of this forum
_ This user is from outside of this forum
_cryptagion@lemmy.dbzer0.com

schrieb zuletzt editiert von

#42

The anti-AI shield and bot-fight mode are free, you don't need to pay anything to use them.
I 1 Antwort Letzte Antwort

0
G gamingchairmodel@lemmy.world

Fuck that. I don't need prosecutors and the courts to rule that accessing publicly available information in a way that the website owner doesn't want is literally a crime. That logic would extend to ad blockers and editing HTML/js in an "inspect element" tag.
E This user is from outside of this forum
E This user is from outside of this forum
encryptkeeper@lemmy.world

schrieb zuletzt editiert von

#43

That logic would not extend to ad blockers, as the point of concern is gaining unauthorized access to a computer system or asset. Blocking ads would not be considered gaining unauthorized access to anything. In fact it would be the opposite of that.
D G C 3 Antworten Letzte Antwort

27
P panda_abyss@lemmy.ca

Cloudflare runs as a CDN/cache/gateway service in front of a ton of websites. Their service is to help protect against DDOS and malicious traffic.

A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.

This is a response to that from Perplexity who run an AI search company. I don’t actually know how their service works, but they were specifically called out in the announcement and Cloudflare accused them of “stealth scraping” and ignoring robots.txt and other things.
_ This user is from outside of this forum
_ This user is from outside of this forum
_cryptagion@lemmy.dbzer0.com

schrieb zuletzt editiert von

#44

It should be pointed out that Cloudflare didn't say they were going to block AI traffic, they give you the option to. The service is a free opt-in for people who want it.
1 Antwort Letzte Antwort

5
D davriellelouna@lemmy.world

This post did not contain any content.
C This user is from outside of this forum
C This user is from outside of this forum
cupcakezealot@piefed.blahaj.zone

schrieb zuletzt editiert von

#45

rare cloudflare w
B 1 Antwort Letzte Antwort

43
P pennomi@lemmy.world

On the flip side, most websites are so ad-ridden these days a reader mode or other summary tool is almost required for normal browsing. Not saying that AI is the right move, but I can understand not wanting to visit the actual page any more.
H This user is from outside of this forum
H This user is from outside of this forum
harkmahlberg@kbin.earth

schrieb zuletzt editiert von

#46

Maybe I missed something, but ublock still works very fine for me, even on mobile. And running a pihole, while not trivial, also takes care of some ad traffic. Firefox coems with a reader mode (a feature I really like even with the adblockers!).

So why do people not want to visit pages anymore, if all these tools already existed?
1 Antwort Letzte Antwort

1
F fauxliving@lemmy.world

The amount of people just reacting to the headline in the comments on these kinds of articles is always surprising.

Your browser acts as an agent too, you don’t manually visit every script link, image source and CSS file. Everyone has experienced how annoying it is to have your browser be targeted by Cloudflare.

There’s a pretty major difference between a human user loading a page and having it summarized and a bot that is scraping 1500 pages/second.

Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation. But a user initiated operation isn’t the same as a bot.

Which is the point of the article and the article’s title.

It isn’t clear why OP had to alter the headline to bait the anti-ai crowd.
_ This user is from outside of this forum
_ This user is from outside of this forum
_cryptagion@lemmy.dbzer0.com

schrieb zuletzt editiert von

#47

Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation.

Well I suppose it's a good thing then that the anti-AI shield is opt-in, and Cloudflare isn't making any decisions for anyone on whether or not AI scrapers get to visit their pages. That little bit of context makes your entire argument fall apart.
F 1 Antwort Letzte Antwort

2
S spankmonkey@lemmy.world

But a user initiated operation isn’t the same as a bot.

Oh fuck off with that AI company propaganda.

The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It's the same fucking thing.

Web crawlers for search engines don't scrape pages every time a user searches like AI does. Both web crawlers and scrapers are bots, and how a human initiates their operation, scheduled or not, doesn't matter as much as the fact that they do things very differently and only one of the two respects robots.txt.
F This user is from outside of this forum
F This user is from outside of this forum
fauxliving@lemmy.world

schrieb zuletzt editiert von

#48

There’s no difference in server load between a user looking at a page and a user using an AI tool to summarize the page.

The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It’s the same fucking thing.

You either didn’t read the article or are deliberately making bad faith arguments. The entire point of the article is that the traffic that they’re referring to is initiated by a user, just like when you type an address into your browser’s address bar.

This traffic, initiated by a user, creates the same server load as that same user loading the page in a browser.

Yes, mass scraping of web pages creates a bunch of server load. This was the case before AI was even a thing.

This situation is like Cloudflare presenting was a captcha in order to load each individual image, css or JavaScript asset into a web browser because bot traffic pretends to be a browser.

I don’t think it’s too hard to understand that a bot pretending to be a browser and a human operated browser are two completely different things and classifying them as the same (and captchaing them) would be a classification error.

This is exactly the same kind of error. Even if you personally believe that users using AI tools should be blocked, not everyone has the same opinion. If Cloudflare can’t distinguish between bot requests and human requests then their customers can’t opt out and allow their users to use AI tools even if they want to.
S 1 Antwort Letzte Antwort

3
_ _cryptagion@lemmy.dbzer0.com

Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation.

Well I suppose it's a good thing then that the anti-AI shield is opt-in, and Cloudflare isn't making any decisions for anyone on whether or not AI scrapers get to visit their pages. That little bit of context makes your entire argument fall apart.
F This user is from outside of this forum
F This user is from outside of this forum
fauxliving@lemmy.world

schrieb zuletzt editiert von

#49

It isn’t opt in.

You can block all bot page scraping, and also block user initiated AI tools or you can block no traffic.

There isn’t an option to block bot page scraping but allow user initiated AI tools.

Because, as the article points out, Cloudflare is not able to distinguish between the two
_ U 2 Antworten Letzte Antwort

2
F floquant@lemmy.dbzer0.com

It's difficult to be a shittier company than OpenAI, but Perplexity seems to be trying hard.
B This user is from outside of this forum
B This user is from outside of this forum
brunbrun6766@lemmy.world

schrieb zuletzt editiert von

#50

Step 1, SOMEHOW find a more punchable face than Altman
1 Antwort Letzte Antwort

31
P pr06lefs@lemmy.ml

So you're a cloudflare customer and you wish they would let the perplexity traffic multiplier through to your website? You can leave cloudflare any time you want.
F This user is from outside of this forum
F This user is from outside of this forum
fauxliving@lemmy.world

schrieb zuletzt editiert von

#51

You’re an Internet user and you don’t like AI so you can leave the Internet anytime you want.

That’s not a good argument, what about the users who want to block mass scraping but want to make their content available to users who are using these tools? Cloudflare exists because it allows legitimate traffic, that websites want, and blocks mass scraping which the sites don’t want.

If they’re not able to distinguish mass scraping traffic from user created traffic then they’re blocking legitimate users that some website owners want.
P 1 Antwort Letzte Antwort

0
D davriellelouna@lemmy.world

This post did not contain any content.
G This user is from outside of this forum
G This user is from outside of this forum
gissamittjobb@lemmy.ml

schrieb zuletzt editiert von

#52

Skill issue. Cope and seethe
S 1 Antwort Letzte Antwort

9
_ _cryptagion@lemmy.dbzer0.com

The anti-AI shield and bot-fight mode are free, you don't need to pay anything to use them.
I This user is from outside of this forum
I This user is from outside of this forum
interdimensionalmeme@lemmy.ml

schrieb zuletzt editiert von

#53

No I'm telling Perplexity, they can just buy their obstacle

People who use the things you have described, for free
are themselves the products being sold
this is implied in the price
J 1 Antwort Letzte Antwort

1
E encryptkeeper@lemmy.world

That logic would not extend to ad blockers, as the point of concern is gaining unauthorized access to a computer system or asset. Blocking ads would not be considered gaining unauthorized access to anything. In fact it would be the opposite of that.
D This user is from outside of this forum
D This user is from outside of this forum
demdaru@lemmy.world

schrieb zuletzt editiert von

#54

Ehhhh, you are gaining access to content due to assumption you are going to interact with ads and thus, bring revenue to the person and/or company producing said content. If you block ads, you remove authorisation brought to you by ads.
H E 2 Antworten Letzte Antwort

4
F This user is from outside of this forum
F This user is from outside of this forum
fauxliving@lemmy.world

schrieb zuletzt editiert von

#55

What does any of that have to do with the fact that Cloudflare isn’t able to classify traffic in order to distinguish between human user generated traffic and mass scraping bot traffic?

If they’re incapable of distinguishing the two, then their customers are having legitimate user requests blocked by Cloudflare with no ability to opt out.

Oh I see lol

Yeah, I think people who’re unable to think rationally about a problem because they made up their mind before knowing any of the details are intellectually lazy.
1 Antwort Letzte Antwort

0
D davriellelouna@lemmy.world

This post did not contain any content.
S This user is from outside of this forum
S This user is from outside of this forum
starchylemming@lemmy.world

schrieb zuletzt editiert von

#56

next step: cloudflare sends hit squads to blow up the source of these slimy data grabber attacks
1 Antwort Letzte Antwort

0
D davriellelouna@lemmy.world

This post did not contain any content.
J This user is from outside of this forum
J This user is from outside of this forum
josefo@leminal.space

schrieb zuletzt editiert von

#57

I really hope Cloudflare doesn't eventually evolve into a shitty ass company, so far I like them very much, and all this massive L for AI only improves my opinion on them.
1 Antwort Letzte Antwort

0
B betadoggo_@lemmy.world

Perplexity (an "AI search engine" company with 500 million in funding) can't bypass cloudflare's anti-bot checks. For each search Perplexity scrapes the top results and summarizes them for the user. Cloudflare intentionally blocks perplexity's scrapers because they ignore robots.txt and mimic real users to get around cloudflare's blocking features. Perplexity argues that their scraping is acceptable because it's user initiated.

Personally I think cloudflare is in the right here. The scraped sites get 0 revenue from Perplexity searches (unless the user decides to go through the sources section and click the links) and Perplexity's scraping is unnecessarily traffic intensive since they don't cache the scraped data.
L This user is from outside of this forum
L This user is from outside of this forum
lividweasel@lemmy.world

schrieb zuletzt editiert von

#58

…and Perplexity's scraping is unnecessarily traffic intensive since they don't cache the scraped data.

That seems almost maliciously stupid. We need to train a new model. Hey, where’d the data go? Oh well, let’s just go scrape it all again. Wait, did we already scrape this site? No idea, let’s scrape it again just to be sure.
J S 2 Antworten Letzte Antwort

2
E encryptkeeper@lemmy.world

That logic would not extend to ad blockers, as the point of concern is gaining unauthorized access to a computer system or asset. Blocking ads would not be considered gaining unauthorized access to anything. In fact it would be the opposite of that.
G This user is from outside of this forum
G This user is from outside of this forum
gamingchairmodel@lemmy.world

schrieb zuletzt editiert von

#59

gaining unauthorized access to a computer system

And my point is that defining "unauthorized" to include visitors using unauthorized tools/methods to access a publicly visible resource would be a policy disaster.

If I put a banner on my site that says "by visiting my site you agree not to modify the scripts or ads displayed on the site," does that make my visit with an ad blocker "unauthorized" under the CFAA? I think the answer should obviously be "no," and that the way to define "authorization" is whether the website puts up some kind of login/authentication mechanism to block or allow specific users, not to put a simple request to the visiting public to please respect the rules of the site.

To me, a robots.txt is more like a friendly request to unauthenticated visitors than it is a technical implementation of some kind of authentication mechanism.

Scraping isn't hacking. I agree with the Third Circuit and the EFF: If the website owner makes a resource available to visitors without authentication, then accessing those resources isn't a crime, even if the website owner didn't intend for site visitors to use that specific method.
G E 2 Antworten Letzte Antwort

10
F fauxliving@lemmy.world

You’re an Internet user and you don’t like AI so you can leave the Internet anytime you want.

That’s not a good argument, what about the users who want to block mass scraping but want to make their content available to users who are using these tools? Cloudflare exists because it allows legitimate traffic, that websites want, and blocks mass scraping which the sites don’t want.

If they’re not able to distinguish mass scraping traffic from user created traffic then they’re blocking legitimate users that some website owners want.
P This user is from outside of this forum
P This user is from outside of this forum
pr06lefs@lemmy.ml

schrieb zuletzt editiert von

#60

Yes your "leave the internet any time you want" strawman is not a good argument.

If allowing perplexity while blocking the bad guys is so easy why not find a service that does that for you?
F 1 Antwort Letzte Antwort

1

Anmelden zum Antworten

D

Tesla applies to supply electricity to households in Great Britain
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
18

1

76 Stimmen

18 Beiträge

24 Aufrufe

P

You'll get cut off if you criticize Elmo, though
R

St. Paul, MN, was hacked so badly that the National Guard has been deployed
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
71

1

232 Stimmen

71 Beiträge

233 Aufrufe

S

So while Utah punches above its weight in tech, St. Paul area absolutely dwarfs it in population. Surely they have a robust cybersecurity industry there... https://lecbyo.files.cmp.optimizely.com/download/fa9be256b74111efa0ca8e42e80f1a8f?sfvrsn=a8aa5246_2 Utah, #1 projected tech sector growth in the next decade, of all 50 states. Utah, #8 for tech sector % of entire state economy, of all 50 states. Minnesota? Doesn't crack top 10 for any metrics. Utah may not be the biggest or techiest state, but it is way more so than Minnesota. The National Guard just seems like a desperate move. Again, this is my argument, but you are only seeing desperation as due to incompetence, not due to... actual severity. When they're deployed, they take orders from the the federal military, Not actually true unless the Nat Guard has been given a direct command by the Pentagon. and at peace, monitoring foreign threats seems like a federal thing. ... which is why the FBI were called in, in addition to the Nat Guard being able to report up the military CoC. You call in the National Guard to put down a riot or something where you just need bodies, not for anything niche. I mean, you yourself have explained that the Nat Guard does have a CyberSec ability, and I've explained they also have the ability to potentially summon even greater CyberSec ability. I guess you would be surprised how involved the military is / can be in defending against national security threatening, critical infrastructure comprimising kinds of domestic threats. Remember Stuxnet? Yeah other people can do that to us now, we kinda uncorked the genie bottle on that one. Otherwise, just call a local cybersecurity firm to trace the attack and assess damage. It is not everyone's instinct or best practice to immediately hire a contracted firm to do things that government agencies can, and have a responsibility to do. If this was like, Amazon being comprimised, yeah I can see that being a more likely avenue, though if it was serious, they'd probably call in some or multiple forms of 'the Feds' as well. But this was a breach/compromise of a municipal network... thats a government thing. Not a private sector thing. EDIT: Also, you are acting like either you are unaware of the following, or ... don't think its real? https://en.wikipedia.org/wiki/Utah_Data_Center Kind of a really big deal in terms of Utah and the tech sector and the Federal government and... things that were totally illegal before the PATRIOT Act. Exabytes of storage. Exabytes. Utah literally is where the NSA is doing their damndest to make a hardcopy of literally all internet traffic and content. Given how classified this facility is, I wouldn't be surprised if their employees don't exactly show up in standard Utah employment figures.
S

Coding and Gaming on AR Glasses
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
8

34 Stimmen

8 Beiträge

60 Aufrufe

S

I think the glasses are quite solid, but I haven’t dropped them yet I never buy additional warranties.
T

Rule34 blocked the UK entirely rather than comply due to the new law.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
180

2

1k Stimmen

180 Beiträge

4k Aufrufe

M

I expect i will crumple at that point. But i hope i set him up with the tools he needs to navigate that part of life. And hopefully he feels close enough with me to come to me for help.
T

Leading AI Models Are Completely Flunking the Three Laws of Robotics
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
37

1

184 Stimmen

37 Beiträge

559 Aufrufe

C

Some of the stories do also include solutions to those same issues, though that also tends to lead to limiting the capabilities of the robots. The message could be interpreted as it being a trade off between versatility and risk.
H

[StableDiffusion] What does "Module" metadata mean in the generated image?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

0 Stimmen

1 Beiträge

20 Aufrufe

Niemand hat geantwortet
P

Canada Drops Digital Tax That Infuriated Trump to Restart Trade Talks
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
17

80 Stimmen

17 Beiträge

167 Aufrufe

R

Jesus fuck that’s a lot of days
H

Telegram, the FSB, and the Man in the Middle
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
8

1

52 Stimmen

8 Beiträge

98 Aufrufe

R

You can be seen from a kilometer away, pots ))