linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall

Technology

119 Beiträge 67 Kommentatoren 0 Aufrufe

D davriellelouna@lemmy.world

This post did not contain any content.
E This user is from outside of this forum
E This user is from outside of this forum
ermiar@lemmy.world

schrieb zuletzt editiert von ermiar@lemmy.world

#15
1 Antwort Letzte Antwort

15
E ekybio@lemmy.world

Can someone with more knowledge shine a bit more light on this while situation? Im out of the loop on the technical details
B This user is from outside of this forum
B This user is from outside of this forum
betadoggo_@lemmy.world

schrieb zuletzt editiert von betadoggo_@lemmy.world

#16

Perplexity (an "AI search engine" company with 500 million in funding) can't bypass cloudflare's anti-bot checks. For each search Perplexity scrapes the top results and summarizes them for the user. Cloudflare intentionally blocks perplexity's scrapers because they ignore robots.txt and mimic real users to get around cloudflare's blocking features. Perplexity argues that their scraping is acceptable because it's user initiated.

Personally I think cloudflare is in the right here. The scraped sites get 0 revenue from Perplexity searches (unless the user decides to go through the sources section and click the links) and Perplexity's scraping is unnecessarily traffic intensive since they don't cache the scraped data.
L 1 Antwort Letzte Antwort

12
D davriellelouna@lemmy.world

This post did not contain any content.
O This user is from outside of this forum
O This user is from outside of this forum
ordnance_qf_17_pounder@reddthat.com

schrieb zuletzt editiert von

#17

Oh no!
1 Antwort Letzte Antwort

7
S sylver_dragon@lemmy.world

You'd think that a competent technology company, with their own AI would be able to figure out a way to spoof Cloudflare's checks. I'd still think that.
L This user is from outside of this forum
L This user is from outside of this forum
lemmyng@piefed.ca

schrieb zuletzt editiert von

#18

Perplexity: "But that would cost us moneeyyyy!"
1 Antwort Letzte Antwort

13
S sylver_dragon@lemmy.world

You'd think that a competent technology company, with their own AI would be able to figure out a way to spoof Cloudflare's checks. I'd still think that.
Q This user is from outside of this forum
Q This user is from outside of this forum
quill7513@slrpnk.net

schrieb zuletzt editiert von

#19

see, but they're not competent. further, they don't care. most of these ai companies are snake oil. they're selling you a solution that doesn't meaningfully solve a problem. their main way of surviving is saying "this is what it can do now, just imagine what it can do if you invest money in my company."

they're scammers, the lot of them, running ponzi schemes with our money. if the planet dies for it, that's no concern of theirs. ponzi schemes require the schemer to have no long term plan, just a line of credit that they can keep drawing from until they skip town before the tax collector comes
1 Antwort Letzte Antwort

23
P panda_abyss@lemmy.ca

Cloudflare runs as a CDN/cache/gateway service in front of a ton of websites. Their service is to help protect against DDOS and malicious traffic.

A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.

This is a response to that from Perplexity who run an AI search company. I don’t actually know how their service works, but they were specifically called out in the announcement and Cloudflare accused them of “stealth scraping” and ignoring robots.txt and other things.
V This user is from outside of this forum
V This user is from outside of this forum
very_well_lost@lemmy.world

schrieb zuletzt editiert von very_well_lost@lemmy.world

#20

A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.

I think it's also worth pointing out that all of the big AI companies are currently burning through cash at an absolutely astonishing rate, and none of them are anywhere close to being profitable. So pay-walling the data they use is probably gonna be pretty painful for their already-tortured bottom line (good).
D 1 Antwort Letzte Antwort

23
P panda_abyss@lemmy.ca

For a lot of AI search I actually end up reading the pages, so I don’t know how much this stops that
A This user is from outside of this forum
A This user is from outside of this forum
astralpath@lemmy.ca

schrieb zuletzt editiert von

#21

You're the outlier, I promise. People are literally forfeiting their brains in favor of an LLM transplant hese days.
P 1 Antwort Letzte Antwort

11
P panda_abyss@lemmy.ca

I actually agree with them

This feels like cloudflare trying to collect rent from both sides instead of doing what’s best for the website owners.

There is a problem with AI crawlers, but these technologies are essentially doing a search, fetching a several pages, scanning/summarizing them, then presenting the findings to the user.

I don’t really think that’s wrong, it’s just a faster version of rummaging through the SEO shit you do when you Google something.

(I’ve never used perplexity, I do use Kagi’s ki assistant for similar search. It runs 3 searches and scans the top results and then provides citations)
P This user is from outside of this forum
P This user is from outside of this forum
pr06lefs@lemmy.ml

schrieb zuletzt editiert von

#22

If a neighborhood is beset by roving bands of thieves, sooner or later strangers will be greeted by a shotgun rather than an invitation to tea, regardless of their intentions. Them's the breaks. Bots are going to take a hit now and their operators are just going to have to deal with it. Sucks when people don't play nice, but this is what you get.
F 1 Antwort Letzte Antwort

3
D davriellelouna@lemmy.world

This post did not contain any content.
G This user is from outside of this forum
G This user is from outside of this forum
glitchvid@lemmy.world

schrieb zuletzt editiert von

#23

When a firm outright admits to bypassing or trying to bypass measures taken to keep them out, you think that would be a slam dunk case of unauthorized access under the CFAA with felony enhancements.
G J 2 Antworten Letzte Antwort

152
A astralpath@lemmy.ca

You're the outlier, I promise. People are literally forfeiting their brains in favor of an LLM transplant hese days.
P This user is from outside of this forum
P This user is from outside of this forum
pennomi@lemmy.world

schrieb zuletzt editiert von

#24

On the flip side, most websites are so ad-ridden these days a reader mode or other summary tool is almost required for normal browsing. Not saying that AI is the right move, but I can understand not wanting to visit the actual page any more.
S H 2 Antworten Letzte Antwort

3
D davriellelouna@lemmy.world

This post did not contain any content.
I This user is from outside of this forum
I This user is from outside of this forum
interdimensionalmeme@lemmy.ml

schrieb zuletzt editiert von

#25

Just buy cloudflare duh
_ 1 Antwort Letzte Antwort

5
D davriellelouna@lemmy.world

This post did not contain any content.
B This user is from outside of this forum
B This user is from outside of this forum
baroqueinmind@piefed.social

schrieb zuletzt editiert von

#26

Cry more, Perplexity.
1 Antwort Letzte Antwort

12
P panda_abyss@lemmy.ca

Cloudflare runs as a CDN/cache/gateway service in front of a ton of websites. Their service is to help protect against DDOS and malicious traffic.

A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.

This is a response to that from Perplexity who run an AI search company. I don’t actually know how their service works, but they were specifically called out in the announcement and Cloudflare accused them of “stealth scraping” and ignoring robots.txt and other things.
R This user is from outside of this forum
R This user is from outside of this forum
roguebanana@piefed.zip

schrieb zuletzt editiert von

#27

But the website owner can still choose to continue blocking them right? Without using additional stuff like Anubis that is.
1 Antwort Letzte Antwort

2
D davriellelouna@lemmy.world

This post did not contain any content.
F This user is from outside of this forum
F This user is from outside of this forum
fauxliving@lemmy.world

schrieb zuletzt editiert von fauxliving@lemmy.world

#28

The amount of people just reacting to the headline in the comments on these kinds of articles is always surprising.

Your browser acts as an agent too, you don’t manually visit every script link, image source and CSS file. Everyone has experienced how annoying it is to have your browser be targeted by Cloudflare.

There’s a pretty major difference between a human user loading a page and having it summarized and a bot that is scraping 1500 pages/second.

Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation. But a user initiated operation isn’t the same as a bot.

Which is the point of the article and the article’s title.

It isn’t clear why OP had to alter the headline to bait the anti-ai crowd.
S H _ U O 5 Antworten Letzte Antwort

11
D davriellelouna@lemmy.world

This post did not contain any content.
I This user is from outside of this forum
I This user is from outside of this forum
iavicenna@lemmy.world

schrieb zuletzt editiert von

#29

ask AI how to do it?
1 Antwort Letzte Antwort

28
P pr06lefs@lemmy.ml

If a neighborhood is beset by roving bands of thieves, sooner or later strangers will be greeted by a shotgun rather than an invitation to tea, regardless of their intentions. Them's the breaks. Bots are going to take a hit now and their operators are just going to have to deal with it. Sucks when people don't play nice, but this is what you get.
F This user is from outside of this forum
F This user is from outside of this forum
fauxliving@lemmy.world

schrieb zuletzt editiert von

#30

I’m sure people that are attempting to drive to their house in a new vehicle wouldn’t appreciate being riddled with bullets because the neighborhood watch makes no attempt to distinguish between thieves and homeowners.
P 1 Antwort Letzte Antwort

0
D davriellelouna@lemmy.world

This post did not contain any content.
K This user is from outside of this forum
K This user is from outside of this forum
kescusay@lemmy.world

schrieb zuletzt editiert von

#31

I set up a WAF for my company's publicly facing developer portal to block out bot traffic from assholes like these guys. It reduced bot traffic to the site by something like - I kid you not - 99.999%.

Fucking data vultures.
1 Antwort Letzte Antwort

9
F fauxliving@lemmy.world

I’m sure people that are attempting to drive to their house in a new vehicle wouldn’t appreciate being riddled with bullets because the neighborhood watch makes no attempt to distinguish between thieves and homeowners.
P This user is from outside of this forum
P This user is from outside of this forum
pr06lefs@lemmy.ml

schrieb zuletzt editiert von

#32

So sad for them. Try not living in a war zone?
F 1 Antwort Letzte Antwort

0
G glitchvid@lemmy.world

When a firm outright admits to bypassing or trying to bypass measures taken to keep them out, you think that would be a slam dunk case of unauthorized access under the CFAA with felony enhancements.
G This user is from outside of this forum
G This user is from outside of this forum
gamingchairmodel@lemmy.world

schrieb zuletzt editiert von

#33

Fuck that. I don't need prosecutors and the courts to rule that accessing publicly available information in a way that the website owner doesn't want is literally a crime. That logic would extend to ad blockers and editing HTML/js in an "inspect element" tag.
E K 2 Antworten Letzte Antwort

64
F fauxliving@lemmy.world

The amount of people just reacting to the headline in the comments on these kinds of articles is always surprising.

Your browser acts as an agent too, you don’t manually visit every script link, image source and CSS file. Everyone has experienced how annoying it is to have your browser be targeted by Cloudflare.

There’s a pretty major difference between a human user loading a page and having it summarized and a bot that is scraping 1500 pages/second.

Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation. But a user initiated operation isn’t the same as a bot.

Which is the point of the article and the article’s title.

It isn’t clear why OP had to alter the headline to bait the anti-ai crowd.
S This user is from outside of this forum
S This user is from outside of this forum
spankmonkey@lemmy.world

schrieb zuletzt editiert von

#34

But a user initiated operation isn’t the same as a bot.

Oh fuck off with that AI company propaganda.

The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It's the same fucking thing.

Web crawlers for search engines don't scrape pages every time a user searches like AI does. Both web crawlers and scrapers are bots, and how a human initiates their operation, scheduled or not, doesn't matter as much as the fact that they do things very differently and only one of the two respects robots.txt.
F 1 Antwort Letzte Antwort

8

Anmelden zum Antworten

R

Customer Data Platform Market
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

2

0 Stimmen

1 Beiträge

11 Aufrufe

Niemand hat geantwortet
A

Business Jet MRO Market: Soaring Demand and Evolving Technologies"
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

2

0 Stimmen

1 Beiträge

21 Aufrufe

Niemand hat geantwortet
N

Iran asks its people to delete WhatsApp
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
25

1

225 Stimmen

25 Beiträge

246 Aufrufe

B

Communicate securely with WhatsApp? That's an oxymoron.
F

Why Denmark is dumping Microsoft Office and Windows for LibreOffice and Linux
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
78

1

1k Stimmen

78 Beiträge

852 Aufrufe

K

I just hear that they move to LibreOffice but not to Linux, ateast not right now.
D

Why so much hate toward AI?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
73

38 Stimmen

73 Beiträge

783 Aufrufe

H

AI has only one problem to solve: salaries
P

Digg founder Kevin Rose offers to buy Pocket from Mozilla
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
7

2

1 Stimmen

7 Beiträge

82 Aufrufe

H

IMO it was already shitty.
R

Kids are short-circuiting their school-issued Chromebooks for TikTok clout
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
58

1

512 Stimmen

58 Beiträge

580 Aufrufe

C

Eh, I kinda like the ephemeral nature of most tiktoks, having things go viral within a group of like 10,000 people, to the extent that if you're tangentially connected to the group, you and everyone you know has seen it, but nobody outside that group ever sees and it vanishes into the ether like a month later makes it a little more personal.
F

*deleted by creator*
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

21 Aufrufe

Niemand hat geantwortet