linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

Technology

126 Beiträge 69 Kommentatoren 0 Aufrufe

D This user is from outside of this forum
D This user is from outside of this forum
davriellelouna@lemmy.world

schrieb zuletzt editiert von davriellelouna@lemmy.world

#1

This post did not contain any content.
F J P E S 32 Antworten Letzte Antwort

447
D davriellelouna@lemmy.world

This post did not contain any content.
F This user is from outside of this forum
F This user is from outside of this forum
floquant@lemmy.dbzer0.com

schrieb zuletzt editiert von

#2

It's difficult to be a shittier company than OpenAI, but Perplexity seems to be trying hard.
B 1 Antwort Letzte Antwort

151
D davriellelouna@lemmy.world

This post did not contain any content.
J This user is from outside of this forum
J This user is from outside of this forum
jeebaichow@lemmy.world

schrieb zuletzt editiert von

#3

Uh.. good?
1 Antwort Letzte Antwort

74
D davriellelouna@lemmy.world

This post did not contain any content.
P This user is from outside of this forum
P This user is from outside of this forum
panda_abyss@lemmy.ca

schrieb zuletzt editiert von panda_abyss@lemmy.ca

#4

I actually agree with them

This feels like cloudflare trying to collect rent from both sides instead of doing what’s best for the website owners.

There is a problem with AI crawlers, but these technologies are essentially doing a search, fetching a several pages, scanning/summarizing them, then presenting the findings to the user.

I don’t really think that’s wrong, it’s just a faster version of rummaging through the SEO shit you do when you Google something.

(I’ve never used perplexity, I do use Kagi’s ki assistant for similar search. It runs 3 searches and scans the top results and then provides citations)
D K P 3 Antworten Letzte Antwort

11
D davriellelouna@lemmy.world

This post did not contain any content.
E This user is from outside of this forum
E This user is from outside of this forum
ekybio@lemmy.world

schrieb zuletzt editiert von

#5

Can someone with more knowledge shine a bit more light on this while situation? Im out of the loop on the technical details
P S B 3 Antworten Letzte Antwort

15
D davriellelouna@lemmy.world

This post did not contain any content.
S This user is from outside of this forum
S This user is from outside of this forum
sylver_dragon@lemmy.world

schrieb zuletzt editiert von

#6

You'd think that a competent technology company, with their own AI would be able to figure out a way to spoof Cloudflare's checks. I'd still think that.
S L Q 3 Antworten Letzte Antwort

32
P panda_abyss@lemmy.ca

I actually agree with them

This feels like cloudflare trying to collect rent from both sides instead of doing what’s best for the website owners.

There is a problem with AI crawlers, but these technologies are essentially doing a search, fetching a several pages, scanning/summarizing them, then presenting the findings to the user.

I don’t really think that’s wrong, it’s just a faster version of rummaging through the SEO shit you do when you Google something.

(I’ve never used perplexity, I do use Kagi’s ki assistant for similar search. It runs 3 searches and scans the top results and then provides citations)
D This user is from outside of this forum
D This user is from outside of this forum
drspod@lemmy.ml

schrieb zuletzt editiert von

#7

What’s best for the website owners is to have people actually visit and interact with their website. Blocking AI tools is consistent with that.
P 1 Antwort Letzte Antwort

30
E ekybio@lemmy.world

Can someone with more knowledge shine a bit more light on this while situation? Im out of the loop on the technical details
P This user is from outside of this forum
P This user is from outside of this forum
panda_abyss@lemmy.ca

schrieb zuletzt editiert von panda_abyss@lemmy.ca

#8

Cloudflare runs as a CDN/cache/gateway service in front of a ton of websites. Their service is to help protect against DDOS and malicious traffic.

A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.

This is a response to that from Perplexity who run an AI search company. I don’t actually know how their service works, but they were specifically called out in the announcement and Cloudflare accused them of “stealth scraping” and ignoring robots.txt and other things.
V R N _ 4 Antworten Letzte Antwort

23
P panda_abyss@lemmy.ca

I actually agree with them

This feels like cloudflare trying to collect rent from both sides instead of doing what’s best for the website owners.

There is a problem with AI crawlers, but these technologies are essentially doing a search, fetching a several pages, scanning/summarizing them, then presenting the findings to the user.

I don’t really think that’s wrong, it’s just a faster version of rummaging through the SEO shit you do when you Google something.

(I’ve never used perplexity, I do use Kagi’s ki assistant for similar search. It runs 3 searches and scans the top results and then provides citations)
K This user is from outside of this forum
K This user is from outside of this forum
kopasz7@sh.itjust.works

schrieb zuletzt editiert von

#9

Search engines been going relatively fine for decades now. But the crawlers from AI companies basically DDOS hosts in comparison, sending so many requests in such a short interval. Crawling dynamic links as well that are expensive to render compared to a static page, ignoring the robots.txt entirely, or even using it discover unlinked pages.

Servers have finite resources, especially self hosted sites, while AI companies have disproportinately more at their disposal, easily grinding other systems to a halt by overwhelming them with requests.
1 Antwort Letzte Antwort

12
D drspod@lemmy.ml

What’s best for the website owners is to have people actually visit and interact with their website. Blocking AI tools is consistent with that.
P This user is from outside of this forum
P This user is from outside of this forum
panda_abyss@lemmy.ca

schrieb zuletzt editiert von panda_abyss@lemmy.ca

#10

For a lot of AI search I actually end up reading the pages, so I don’t know how much this stops that
A 1 Antwort Letzte Antwort

3
E ekybio@lemmy.world

Can someone with more knowledge shine a bit more light on this while situation? Im out of the loop on the technical details
S This user is from outside of this forum
S This user is from outside of this forum
spankmonkey@lemmy.world

schrieb zuletzt editiert von

#11

AI crawlers tend to overwhelm websites by doing the least efficient scraping of data possible, basically DDOSing a huge portion of the internet. Perplexity already scraped the net for training data and is now hammering it inefficiently for searches.

Cloudflare is just trying to keep the bots from overwhelming everything.
1 Antwort Letzte Antwort

41
S sylver_dragon@lemmy.world

You'd think that a competent technology company, with their own AI would be able to figure out a way to spoof Cloudflare's checks. I'd still think that.
S This user is from outside of this forum
S This user is from outside of this forum
spankmonkey@lemmy.world

schrieb zuletzt editiert von spankmonkey@lemmy.world

#12

Or find a more efficient way to manage data, since their current approach is basically DDOSing the internet for training data and also for responding to user interactions.
1 Antwort Letzte Antwort

54
D davriellelouna@lemmy.world

This post did not contain any content.
X This user is from outside of this forum
X This user is from outside of this forum
xxce2aab@feddit.dk

schrieb zuletzt editiert von

#13

Ooh, that's though sweetheart. If the owners of those servers want you to visit, they'll just choose another WAF than CF's.

All zero of them.
1 Antwort Letzte Antwort

2
D davriellelouna@lemmy.world

This post did not contain any content.
D This user is from outside of this forum
D This user is from outside of this forum
dzajew@piefed.social

schrieb zuletzt editiert von

#14

Cry me a river
1 Antwort Letzte Antwort

3
D davriellelouna@lemmy.world

This post did not contain any content.
E This user is from outside of this forum
E This user is from outside of this forum
ermiar@lemmy.world

schrieb zuletzt editiert von ermiar@lemmy.world

#15
1 Antwort Letzte Antwort

15
E ekybio@lemmy.world

Can someone with more knowledge shine a bit more light on this while situation? Im out of the loop on the technical details
B This user is from outside of this forum
B This user is from outside of this forum
betadoggo_@lemmy.world

schrieb zuletzt editiert von betadoggo_@lemmy.world

#16

Perplexity (an "AI search engine" company with 500 million in funding) can't bypass cloudflare's anti-bot checks. For each search Perplexity scrapes the top results and summarizes them for the user. Cloudflare intentionally blocks perplexity's scrapers because they ignore robots.txt and mimic real users to get around cloudflare's blocking features. Perplexity argues that their scraping is acceptable because it's user initiated.

Personally I think cloudflare is in the right here. The scraped sites get 0 revenue from Perplexity searches (unless the user decides to go through the sources section and click the links) and Perplexity's scraping is unnecessarily traffic intensive since they don't cache the scraped data.
L 1 Antwort Letzte Antwort

12
D davriellelouna@lemmy.world

This post did not contain any content.
O This user is from outside of this forum
O This user is from outside of this forum
ordnance_qf_17_pounder@reddthat.com

schrieb zuletzt editiert von

#17

Oh no!
1 Antwort Letzte Antwort

7
S sylver_dragon@lemmy.world

You'd think that a competent technology company, with their own AI would be able to figure out a way to spoof Cloudflare's checks. I'd still think that.
L This user is from outside of this forum
L This user is from outside of this forum
lemmyng@piefed.ca

schrieb zuletzt editiert von

#18

Perplexity: "But that would cost us moneeyyyy!"
1 Antwort Letzte Antwort

13
S sylver_dragon@lemmy.world

You'd think that a competent technology company, with their own AI would be able to figure out a way to spoof Cloudflare's checks. I'd still think that.
Q This user is from outside of this forum
Q This user is from outside of this forum
quill7513@slrpnk.net

schrieb zuletzt editiert von

#19

see, but they're not competent. further, they don't care. most of these ai companies are snake oil. they're selling you a solution that doesn't meaningfully solve a problem. their main way of surviving is saying "this is what it can do now, just imagine what it can do if you invest money in my company."

they're scammers, the lot of them, running ponzi schemes with our money. if the planet dies for it, that's no concern of theirs. ponzi schemes require the schemer to have no long term plan, just a line of credit that they can keep drawing from until they skip town before the tax collector comes
1 Antwort Letzte Antwort

23
P panda_abyss@lemmy.ca

Cloudflare runs as a CDN/cache/gateway service in front of a ton of websites. Their service is to help protect against DDOS and malicious traffic.

A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.

This is a response to that from Perplexity who run an AI search company. I don’t actually know how their service works, but they were specifically called out in the announcement and Cloudflare accused them of “stealth scraping” and ignoring robots.txt and other things.
V This user is from outside of this forum
V This user is from outside of this forum
very_well_lost@lemmy.world

schrieb zuletzt editiert von very_well_lost@lemmy.world

#20

A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.

I think it's also worth pointing out that all of the big AI companies are currently burning through cash at an absolutely astonishing rate, and none of them are anywhere close to being profitable. So pay-walling the data they use is probably gonna be pretty painful for their already-tortured bottom line (good).
D 1 Antwort Letzte Antwort

23

Anmelden zum Antworten

D

Social media influencer Andrew Tate sues Meta, TikTok for over $50 million for ‘deplatforming’ him
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

0 Aufrufe

Niemand hat geantwortet
T

Right to Repair Gains Traction as John Deere Faces Trial
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
30

1

621 Stimmen

30 Beiträge

442 Aufrufe

R

Run the Jewels?
A

Scientists spot ‘superorganism’ in the wild for the first time and it’s made of worms, In a groundbreaking discovery, scientists have observed nematodes, tiny worms, forming 'living towers' in nature
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
8

1

66 Stimmen

8 Beiträge

81 Aufrufe

E

The Convergiance is beginning. Altman Be Praised!!
D

Why so much hate toward AI?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
73

38 Stimmen

73 Beiträge

783 Aufrufe

H

AI has only one problem to solve: salaries
P

[JS Required] EU Unveils DNS4EU, a Public DNS Resolver Intended as a European Alternative to Services Like Google’s Public DNS and Cloudflare’s DNS.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
46

288 Stimmen

46 Beiträge

955 Aufrufe

G

Just for the record, even in Italy the winter tires are required for the season (but we can just have chains on board and we are good). Double checking and it doesn’t seem like it? Then again I don’t live in Italy. Here in Sweden you’ll face a fine of ~2000kr (roughly 200€) per tire on your vehicle that is out of spec. https://www.europe-consommateurs.eu/en/travelling-motor-vehicles/motor-vehicles/winter-tyres-in-europe.html Well, I live in Italy and they are required at least in all the northern regions and over a certain altitude in all the others from 15th November to 15th April. Then in some regions these limits are differents as you have seen. So we in Italy already have a law that consider a different situation for the same rule. Granted that you need to write a more complex law, but in the end it is nothing impossible. …and thus it is much simpler to handle these kinds of regulations at a lower level. No need for everyone everywhere to agree, people can have rules that work for them where they live, folks are happier and don’t have to struggle against a system run by bureaucrats so far away they have no idea what reality on the ground is (and they can’t, it’s impossible to account for every scenario centrally). Even on a municipal level certain regulations differ, and that’s completely ok! So it is not that difficult, just write a directive that say: "All the member states should make laws that require winter tires in every place it is deemed necessary". I don't really think that making EU more integrated is impossibile
P

Gumroad Founder Sahil Lavingia Reveals He Was Let Go from DOGE as Software Engineer for the Department of Veterans Affairs After Just 55 Days
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
12

1

61 Stimmen

12 Beiträge

115 Aufrufe

M

is the linked article or the title edited? This was a post about VA GPT
F

[Opinion] Unending ransomware attacks are a symptom, not the sickness
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
4

1

44 Stimmen

4 Beiträge

56 Aufrufe

G

It varies based on local legislation, so in some places paying ransoms is banned but it's by no means universal. It's totally valid to be against paying ransoms wherever possible, but it's not entirely black and white in some situations. For example, what if a hospital gets ransomed? Say they serve an area not served by other facilities, and if they can't get back online quickly people will die? Sounds dramatic, but critical public services get ransomed all the time and there are undeniable real world consequences. Recovery from ransomware can cost significantly more than a ransom payment if you're not prepared. It can also take months to years to recover, especially if you're simultaneously fighting to evict a persistent (annoyed, unpaid) threat actor from your environment. For the record I don't think ransoms should be paid in most scenarios, but I do think there is some nuance to consider here.
D

Paul McCartney and Dua Lipa urge UK Prime Minister to rethink his AI copyright plans. A new law could soon allow AI companies to use copyrighted material without permission.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
107

1

871 Stimmen

107 Beiträge

1k Aufrufe

S

How are they going to make money off of these projects if people can legally copy and redistribute them for free? The same reasons everyone doesn't already do this via pirating. You mean copy, not steal. When something is stolen from you, you no longer have it. Wow you are just a troll, thanks for showing me so I don't waste anymore time with you.