linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

Technology

138 Beiträge 79 Kommentatoren 1 Aufrufe

D davriellelouna@lemmy.world

This post did not contain any content.
K This user is from outside of this forum
K This user is from outside of this forum
kissaki@feddit.org

schrieb zuletzt editiert von kissaki@feddit.org

#127

Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.

So, I assume Perplexity uses appropriate identifiable user-agent headers, to allow hosters to decide whether to serve them one way or another?
L 1 Antwort Letzte Antwort

51
J jqubed@lemmy.world

I think in Cloudflare’s case the free tier website owners are more an example of just giving the users a limited product in hopes of enticing them to upgrade to the paid product with more features and better performance. Cloudflare might get some benefit in the ability to track end-users across more websites as part of their efforts to determine who is a real human versus a potentially-malicious bot, but I don’t think that really gives the same ROI like Facebook or other services extract from their “free” services where the users are the actual product.
_ This user is from outside of this forum
_ This user is from outside of this forum
_cryptagion@anarchist.nexus

schrieb zuletzt editiert von

#128

Actually, they've said that their free tier is what gives them a paid tier to sell to other people. They know most people aren't going to buy anything from them, but the are fine with that because they get to collect a ton of data about who is using hundreds of thousands of websites in order to figure out what traffic is bad. Without that huge user base, they can't do what they do.

And judging from the article, it's working out for them.
1 Antwort Letzte Antwort

1
G glitchvid@lemmy.world

When sites put challenges like Anubis or other measures to authenticate that the viewer isn't a robot, and scrapers then employ measures to thwart that authentication (via spoofing or other means) I think that's a reasonable violation of the CFAA in spirit — especially since these mass scraping activities are getting attention for the damage they are causing to site operators (another factor in the CFAA, and one that would promote this to felony activity.)

The fact is these laws are already on the books, we may as well utilize them to shut down this objectively harmful activity AI scrapers are doing.
A This user is from outside of this forum
A This user is from outside of this forum
aatube@lemmy.dbzer0.com

schrieb zuletzt editiert von

#129

That same logic is how Aaron Swartz was cornered into suicide for scraping JSTOR, something widely agreed to be a bad idea by a wide range of lawspeople including SCOTUS in its 2021 decision Van Buren v. US that struck this interpretation off the books.
1 Antwort Letzte Antwort

1
D davriellelouna@lemmy.world

This post did not contain any content.
T This user is from outside of this forum
T This user is from outside of this forum
tibi@lemmy.world

schrieb zuletzt editiert von

#130

You could say they are... Perplexed.
1 Antwort Letzte Antwort

39
W wolflink@sh.itjust.works

This is a nice CloudFlare ad
P This user is from outside of this forum
P This user is from outside of this forum
pyre@lemmy.world

schrieb zuletzt editiert von

#131

yeah. still not worth dealing with fucking cloudflare. fuck cloudflare.
I 1 Antwort Letzte Antwort

16
P pyre@lemmy.world

yeah. still not worth dealing with fucking cloudflare. fuck cloudflare.
I This user is from outside of this forum
I This user is from outside of this forum
int32@lemmy.dbzer0.com

schrieb zuletzt editiert von

#132

DEATH TO CLOUDFLARE!
1 Antwort Letzte Antwort

5
K kissaki@feddit.org

Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.

So, I assume Perplexity uses appropriate identifiable user-agent headers, to allow hosters to decide whether to serve them one way or another?
L This user is from outside of this forum
L This user is from outside of this forum
lime@feddit.nu

schrieb zuletzt editiert von

#133

yeah it's almost like there as already a system for this in place
1 Antwort Letzte Antwort

13
T tomalley8342@lemmy.world

Nah, that would also mean using Newpipe, YoutubeDL, Revanced, and Tachiyomi would be a crime, and it would only take the re-introduction of WEI to extend that criminalization to the rest of the web ecosystem. It would be extremely shortsighted and foolish of me to cheer on the criminalization of user spoofing and browser automation because of this.
G This user is from outside of this forum
G This user is from outside of this forum
glitchvid@lemmy.world

schrieb zuletzt editiert von glitchvid@lemmy.world

#134

Do you think DoS/DDoS activities should be criminal?

If you're a site operator and the mass AI scraping is genuinely causing operational problems (not hard to imagine, I've seen what it does to my hosted repositories pages) should there be recourse? Especially if you're actively trying to prevent that activity (revoking consent in cookies, authorization captchas).

In general I think the idea of "your right to swing your fists ends at my face" applies reasonably well here — these AI scraping companies are giving lots of admins bloody noses and need to be held accountable.

I really am amenable to arguments wrt the right to an open web, but look at how many sites are hiding behind CF and other portals, or outright becoming hostile to any scraping at all; we're already seeing the rapid death of the ideal because of these malicious scrapers, and we should be using all available recourse to stop this bleeding.
T 1 Antwort Letzte Antwort

0
K kokesh@lemmy.world

Is there some simply deployable PHP honeytrap for AI crawlers?
B This user is from outside of this forum
B This user is from outside of this forum
blargh513@sh.itjust.works

schrieb zuletzt editiert von

#135

Used to make tarpits with reverse proxies. Accept the connection and then set the responses for a few seconds before default TCP timeout. Doesn't eat much resource as long as you have enough TCP connections and can reuse them effectively.
1 Antwort Letzte Antwort

0
G glitchvid@lemmy.world

Do you think DoS/DDoS activities should be criminal?

If you're a site operator and the mass AI scraping is genuinely causing operational problems (not hard to imagine, I've seen what it does to my hosted repositories pages) should there be recourse? Especially if you're actively trying to prevent that activity (revoking consent in cookies, authorization captchas).

In general I think the idea of "your right to swing your fists ends at my face" applies reasonably well here — these AI scraping companies are giving lots of admins bloody noses and need to be held accountable.

I really am amenable to arguments wrt the right to an open web, but look at how many sites are hiding behind CF and other portals, or outright becoming hostile to any scraping at all; we're already seeing the rapid death of the ideal because of these malicious scrapers, and we should be using all available recourse to stop this bleeding.
T This user is from outside of this forum
T This user is from outside of this forum
tomalley8342@lemmy.world

schrieb zuletzt editiert von

#136

DoS attacks are already a crime, so of course the need for some kind of solution is clear. But any proposal that gatekeeps the internet and restricts the freedoms with which the user can interact with it is no solution at all. To me, the openness of the web shouldn't be something that people just consider, or are amenable to. It should be the foundation in which all reasonable proposals should consider as a principle truth.
1 Antwort Letzte Antwort

0
J jve@lemmy.world

Right? Isn’t this a textbook DMCA violation, too?
W This user is from outside of this forum
W This user is from outside of this forum
whyjiffie@sh.itjust.works

schrieb zuletzt editiert von

#137

for us, not for them. wait until they argue in court that actually its us at fault and we need to provide access or else
1 Antwort Letzte Antwort

0
D davriellelouna@lemmy.world

This post did not contain any content.
K This user is from outside of this forum
K This user is from outside of this forum
kittenzrulz123@lemmy.blahaj.zone

schrieb zuletzt editiert von

#138
1 Antwort Letzte Antwort

3

Anmelden zum Antworten

K

Hubungi Gopay
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1 Stimmen

1 Beiträge

7 Aufrufe

Niemand hat geantwortet
M

WhatsApp is dropping its native Windows app in favor of an uglier web version
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
23

1

168 Stimmen

23 Beiträge

356 Aufrufe

S

Yep! Time to go back to the old ways... Brb while i just load up my server with 10tb of DVD rips from my garage and hook them up to my raspberry pi with jellyfin
P

Musk’s Chatbot Started Spouting Nazi Propaganda. That’s Not the Scariest Part.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
9

103 Stimmen

9 Beiträge

129 Aufrufe

B

Dude, what is this book. Who writes like this? I fucking adore it
P

How Western Tech Powers Russia’s War Against Ukrainian Civilians
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

18 Stimmen

1 Beiträge

22 Aufrufe

Niemand hat geantwortet
P

Trump extends the TikTok ban deadline for a third time; there is no legal basis for the extensions and it is unclear how many times the deadline can be extended
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
42

1

429 Stimmen

42 Beiträge

594 Aufrufe

B

I'm not sure who you're referencing to, but I'm assuming you're not referring to me, because I despise the IDF
R

Discover the Power of Xerox Phaser Printers for Your Indian Business
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

2 Stimmen

1 Beiträge

20 Aufrufe

Niemand hat geantwortet
P

Oppose STOP CSAM: Protecting Kids Shouldn’t Mean Breaking the Tools That Keep Us Safe
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
24

1

396 Stimmen

24 Beiträge

320 Aufrufe

D

Lots of people have kids nowadays in their houses, we should ban all of that and put them all in a specialized center or something. I can't imagine what all those people are doing with kids behind close doors under the guise of "family". Truly scary if you think about it.
P

Nvidia debuts a native GeForce NOW app for Steam Deck, supporting games in up to 4K at 60 FPS; in testing, the app extended Steam Deck battery life by up to 50%
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
37

1

146 Stimmen

37 Beiträge

366 Aufrufe

D

Self hosted Sunshine and Moonlight is the way to go.