linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

Technology

126 Beiträge 69 Kommentatoren 0 Aufrufe

D davriellelouna@lemmy.world

This post did not contain any content.
W This user is from outside of this forum
W This user is from outside of this forum
wosat@lemmy.world

schrieb zuletzt editiert von

#61

This is why companies like Perplexity and OpenAI are creating browsers.
1 Antwort Letzte Antwort

17
G gamingchairmodel@lemmy.world

gaining unauthorized access to a computer system

And my point is that defining "unauthorized" to include visitors using unauthorized tools/methods to access a publicly visible resource would be a policy disaster.

If I put a banner on my site that says "by visiting my site you agree not to modify the scripts or ads displayed on the site," does that make my visit with an ad blocker "unauthorized" under the CFAA? I think the answer should obviously be "no," and that the way to define "authorization" is whether the website puts up some kind of login/authentication mechanism to block or allow specific users, not to put a simple request to the visiting public to please respect the rules of the site.

To me, a robots.txt is more like a friendly request to unauthenticated visitors than it is a technical implementation of some kind of authentication mechanism.

Scraping isn't hacking. I agree with the Third Circuit and the EFF: If the website owner makes a resource available to visitors without authentication, then accessing those resources isn't a crime, even if the website owner didn't intend for site visitors to use that specific method.
G This user is from outside of this forum
G This user is from outside of this forum
glitchvid@lemmy.world

schrieb zuletzt editiert von glitchvid@lemmy.world

#62

When sites put challenges like Anubis or other measures to authenticate that the viewer isn't a robot, and scrapers then employ measures to thwart that authentication (via spoofing or other means) I think that's a reasonable violation of the CFAA in spirit — especially since these mass scraping activities are getting attention for the damage they are causing to site operators (another factor in the CFAA, and one that would promote this to felony activity.)

The fact is these laws are already on the books, we may as well utilize them to shut down this objectively harmful activity AI scrapers are doing.
U T 2 Antworten Letzte Antwort

4
P pr06lefs@lemmy.ml

Yes your "leave the internet any time you want" strawman is not a good argument.

If allowing perplexity while blocking the bad guys is so easy why not find a service that does that for you?
F This user is from outside of this forum
F This user is from outside of this forum
fauxliving@lemmy.world

schrieb zuletzt editiert von

#63

The topic is that Cloudflare is classifying human sourced traffic as bot sourced traffic.

Saying “Just don’t use it” is a straw man. It doesn’t change the fact that Cloudflare, one of the largest CDNs representing a significant portion of the websites and services in the US, is misclassifying traffic.

I used mine intentionally while knowing it was a straw man, did you?

The same with “if it’s so easy, just don’t use it” hopefully for obvious reasons.

This affects both the customers of Cloudflare (the web service owners) as well as the users of the web services. A single site/user opting out doesn’t change the fact that a large portion of the Internet is classifying human sourced traffic as bot sourced traffic.
P 1 Antwort Letzte Antwort

0
G gissamittjobb@lemmy.ml

Skill issue. Cope and seethe
S This user is from outside of this forum
S This user is from outside of this forum
sol6_vi@lemmy.makearmy.io

schrieb zuletzt editiert von

#64

this made me lol
1 Antwort Letzte Antwort

2
F fauxliving@lemmy.world

There’s no difference in server load between a user looking at a page and a user using an AI tool to summarize the page.

The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It’s the same fucking thing.

You either didn’t read the article or are deliberately making bad faith arguments. The entire point of the article is that the traffic that they’re referring to is initiated by a user, just like when you type an address into your browser’s address bar.

This traffic, initiated by a user, creates the same server load as that same user loading the page in a browser.

Yes, mass scraping of web pages creates a bunch of server load. This was the case before AI was even a thing.

This situation is like Cloudflare presenting was a captcha in order to load each individual image, css or JavaScript asset into a web browser because bot traffic pretends to be a browser.

I don’t think it’s too hard to understand that a bot pretending to be a browser and a human operated browser are two completely different things and classifying them as the same (and captchaing them) would be a classification error.

This is exactly the same kind of error. Even if you personally believe that users using AI tools should be blocked, not everyone has the same opinion. If Cloudflare can’t distinguish between bot requests and human requests then their customers can’t opt out and allow their users to use AI tools even if they want to.
S This user is from outside of this forum
S This user is from outside of this forum
spankmonkey@lemmy.world

schrieb zuletzt editiert von

#65

There is no difference between emptying a glass of water and draining swimming pool either if you ignore the total volume of water.
F 1 Antwort Letzte Antwort

0
F fauxliving@lemmy.world

It isn’t opt in.

You can block all bot page scraping, and also block user initiated AI tools or you can block no traffic.

There isn’t an option to block bot page scraping but allow user initiated AI tools.

Because, as the article points out, Cloudflare is not able to distinguish between the two
_ This user is from outside of this forum
_ This user is from outside of this forum
_cryptagion@lemmy.dbzer0.com

schrieb zuletzt editiert von

#66

There’s no appreciable difference on how they affect systems between the two for site owners.
F 1 Antwort Letzte Antwort

0
F fauxliving@lemmy.world

The topic is that Cloudflare is classifying human sourced traffic as bot sourced traffic.

Saying “Just don’t use it” is a straw man. It doesn’t change the fact that Cloudflare, one of the largest CDNs representing a significant portion of the websites and services in the US, is misclassifying traffic.

I used mine intentionally while knowing it was a straw man, did you?

The same with “if it’s so easy, just don’t use it” hopefully for obvious reasons.

This affects both the customers of Cloudflare (the web service owners) as well as the users of the web services. A single site/user opting out doesn’t change the fact that a large portion of the Internet is classifying human sourced traffic as bot sourced traffic.
P This user is from outside of this forum
P This user is from outside of this forum
pr06lefs@lemmy.ml

schrieb zuletzt editiert von

#67

LOL "human sourced traffic" oh the tragedy. I for one am rooting for perplexity to go out of business forever.
F 1 Antwort Letzte Antwort

1
S spankmonkey@lemmy.world

There is no difference between emptying a glass of water and draining swimming pool either if you ignore the total volume of water.
F This user is from outside of this forum
F This user is from outside of this forum
fauxliving@lemmy.world

schrieb zuletzt editiert von fauxliving@lemmy.world

#68

I, too, can make any argument sound silly if I want to argue in bad faith.

A user cannot physically generate as much traffic as a bot.

Just like a glass of water cannot physically contain as much water as a swimming pool, so pretending the two are equal is ignorant in both cases.
S 1 Antwort Letzte Antwort

2
D davriellelouna@lemmy.world

This post did not contain any content.
P This user is from outside of this forum
P This user is from outside of this forum
peoplebeproblems@midwest.social

schrieb zuletzt editiert von

#69

Well... Good.
1 Antwort Letzte Antwort

19
F fauxliving@lemmy.world

I, too, can make any argument sound silly if I want to argue in bad faith.

A user cannot physically generate as much traffic as a bot.

Just like a glass of water cannot physically contain as much water as a swimming pool, so pretending the two are equal is ignorant in both cases.
S This user is from outside of this forum
S This user is from outside of this forum
spankmonkey@lemmy.world

schrieb zuletzt editiert von

#70

A user cannot physically generate as much traffic as a bot.

You are so close to getting it!
F 1 Antwort Letzte Antwort

1
C cupcakezealot@piefed.blahaj.zone

rare cloudflare w
B This user is from outside of this forum
B This user is from outside of this forum
boonhet@sopuli.xyz

schrieb zuletzt editiert von

#71

As far as security is concerned, their w's are pretty common tbh. It's just the whole centralization issue.
1 Antwort Letzte Antwort

27
S spankmonkey@lemmy.world

A user cannot physically generate as much traffic as a bot.

You are so close to getting it!
F This user is from outside of this forum
F This user is from outside of this forum
fauxliving@lemmy.world

schrieb zuletzt editiert von

#72

And you’re not even close.
S 1 Antwort Letzte Antwort

2
L lividweasel@lemmy.world

…and Perplexity's scraping is unnecessarily traffic intensive since they don't cache the scraped data.

That seems almost maliciously stupid. We need to train a new model. Hey, where’d the data go? Oh well, let’s just go scrape it all again. Wait, did we already scrape this site? No idea, let’s scrape it again just to be sure.
J This user is from outside of this forum
J This user is from outside of this forum
jballs@lemmy.world

schrieb zuletzt editiert von

#73

It's worth giving the article a read. It seems that they're not using the data for training, but for real-time results.
1 Antwort Letzte Antwort

1
_ _cryptagion@lemmy.dbzer0.com

There’s no appreciable difference on how they affect systems between the two for site owners.
F This user is from outside of this forum
F This user is from outside of this forum
fauxliving@lemmy.world

schrieb zuletzt editiert von

#74

There’s a pretty significant difference in request rate. A tool trying to search and summarize will hit a search engine once, and each website maybe 5 times (if every search engine link points to the site).

A bot trying to scrape content from a website can generate thousands or tens of thousands of requests per second.
1 Antwort Letzte Antwort

2
I interdimensionalmeme@lemmy.ml

No I'm telling Perplexity, they can just buy their obstacle

People who use the things you have described, for free
are themselves the products being sold
this is implied in the price
J This user is from outside of this forum
J This user is from outside of this forum
jqubed@lemmy.world

schrieb zuletzt editiert von

#75

I think in Cloudflare’s case the free tier website owners are more an example of just giving the users a limited product in hopes of enticing them to upgrade to the paid product with more features and better performance. Cloudflare might get some benefit in the ability to track end-users across more websites as part of their efforts to determine who is a real human versus a potentially-malicious bot, but I don’t think that really gives the same ROI like Facebook or other services extract from their “free” services where the users are the actual product.
U I 2 Antworten Letzte Antwort

1
P pr06lefs@lemmy.ml

LOL "human sourced traffic" oh the tragedy. I for one am rooting for perplexity to go out of business forever.
F This user is from outside of this forum
F This user is from outside of this forum
fauxliving@lemmy.world

schrieb zuletzt editiert von

#76

I for one am rooting for perplexity to go out of business forever.

Yeah, I know.

You’re engaging in motivated reasoning. That’s why you’re saying irrational things, because you’re working backwards from a conclusion (AI bad).
P 1 Antwort Letzte Antwort

0
D demdaru@lemmy.world

Ehhhh, you are gaining access to content due to assumption you are going to interact with ads and thus, bring revenue to the person and/or company producing said content. If you block ads, you remove authorisation brought to you by ads.
H This user is from outside of this forum
H This user is from outside of this forum
hk65@sopuli.xyz

schrieb zuletzt editiert von

#77

There was no header on the request saying I want ads though
1 Antwort Letzte Antwort

0
F fauxliving@lemmy.world

And you’re not even close.
S This user is from outside of this forum
S This user is from outside of this forum
spankmonkey@lemmy.world

schrieb zuletzt editiert von spankmonkey@lemmy.world

#78

The AI doesn't just do a web search and display a page, in grabs the search results and scrapes multiple pages far faster than a person could.

It doesn't matter whether a human initiated it when the load on the website is far, far higher and more intrusive in a shorter period of time with AI compared to a human doing a web search and reading the cobtent themselves.
F 1 Antwort Letzte Antwort

0
D davriellelouna@lemmy.world

This post did not contain any content.
U This user is from outside of this forum
U This user is from outside of this forum
ubergeek@lemmy.today

schrieb zuletzt editiert von

#79

Good. I went through my CF panel, and blocked some of those "AI Assistants" that by default were open, including Perplexity's.
_ 1 Antwort Letzte Antwort

20
G glitchvid@lemmy.world

When sites put challenges like Anubis or other measures to authenticate that the viewer isn't a robot, and scrapers then employ measures to thwart that authentication (via spoofing or other means) I think that's a reasonable violation of the CFAA in spirit — especially since these mass scraping activities are getting attention for the damage they are causing to site operators (another factor in the CFAA, and one that would promote this to felony activity.)

The fact is these laws are already on the books, we may as well utilize them to shut down this objectively harmful activity AI scrapers are doing.
U This user is from outside of this forum
U This user is from outside of this forum
ubergeek@lemmy.today

schrieb zuletzt editiert von

#80

The fact is these laws are already on the books, we may as well utilize them to shut down this objectively harmful activity AI scrapers are doing.

Silly plebe! Those laws are there to target the working class, not to be used against corporations. See: Copyright.
R 1 Antwort Letzte Antwort

1

Anmelden zum Antworten

N

"This Website is Served from Nine Neovim Buffers on My Old ThinkPad"
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
13

58 Stimmen

13 Beiträge

1 Aufrufe

N

that is so interesting, did you publish the code somewhere?
F

[Technology Connections] VHS-C: when a lazy idea stumbles towards perfection (32:36)
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
5

136 Stimmen

5 Beiträge

13 Aufrufe

W

Such interesting videos on such nostalgic pieces of technology too. I could listen to him ramble about this stuff all day (and I have).
P

Illinois Governor JB Pritzker signed a bill into law on August 1 banning AI use for providing mental health services, while allowing its use in admin roles
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
6

170 Stimmen

6 Beiträge

14 Aufrufe

J

All I wanna say is that they don't really care about us. Tons of ppl are reporting that AI has talked them off the ledge at night when everyone is sleeping. The Government wants you to kill yourself or dial their hotline where nobody ever picks up Oh ppl can't afford healthcare and therapy?let them eat cake .
?

Ultimate IPTV Guide 2025: Everything You Need to Know About Modern Streaming
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

14 Aufrufe

Niemand hat geantwortet
P

California’s Corporate Cover-Up Act Is a Privacy Nightmare: it would let corporations spy on us in secret, gutting long-standing protections without a shred of accountability.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
8

1

363 Stimmen

8 Beiträge

84 Aufrufe

A

No I don't think there really were many so your point is valid But the law works like that, things are in a grey area or in limbo until they are defined into law. That means the new law can be written to either protect consumer privacy, or make it legal to the letter to rape consumer privacy like this bill, or some weird inbetween where some shady stuff is still explicitly allowed but in general consumers are protected in specific ways from specific privacy abuses This bill being the second option is bad because typically when laws are written it then takes a loooong time to reverse them
P

Millions of Americans Who Have Waited Decades for Fast Internet Connections Will Keep Waiting After the Trump Administration Threw a $42 Billion High-Speed Internet Program Into Disarray.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
83

1

834 Stimmen

83 Beiträge

1k Aufrufe

S

Which big companies lose money? Frontier or other companies? People switch where? To frontier or away from frontier? Who has faster internet? Frontier or frontier competitors? What does it matter that there are leftists and centrists in the state? How does this have anything to do with the comment u writing about?
P

A World Without iPhones?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

34 Stimmen

7 Beiträge

85 Aufrufe

S

I believe the world was a better place before smartphones started dominating everyone's attention. It has had a profound impact on how people are socializing, and not in a positive way if you ask me.
F

If you value privacy, ditch Chrome and switch to Firefox now
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

6 Stimmen

3 Beiträge

50 Aufrufe

B

Why did firefox kill pwa support on desktop?