linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Millions of websites to get 'game-changing' AI bot blocker

Technology

28 Beiträge 23 Kommentatoren 13 Aufrufe

O orclev@lemmy.world

The problem is that the biggest service Cloudflare provides is DDoS protection, and doing that requires that you have more bandwidth available than your attacker. Having enough bandwidth to withstand modern botnet powered DDoS attacks is ridiculously expensive (and it's also a finite resource, there's only so much backbone infrastructure). Basically it's economically infeasible to have multiple companies providing the service Cloudflare does. You might be able to get away with two companies doing so, but it's unlikely you could manage more than that without some of them starting to go bankrupt.
K This user is from outside of this forum
K This user is from outside of this forum
kowowow@lemmy.ca

schrieb zuletzt editiert von

#5

I wonder if it would be a good investment for a country to have their own then down the line expand to sell the same service to others
A 1 Antwort Letzte Antwort

6
O orclev@lemmy.world

The problem is that the biggest service Cloudflare provides is DDoS protection, and doing that requires that you have more bandwidth available than your attacker. Having enough bandwidth to withstand modern botnet powered DDoS attacks is ridiculously expensive (and it's also a finite resource, there's only so much backbone infrastructure). Basically it's economically infeasible to have multiple companies providing the service Cloudflare does. You might be able to get away with two companies doing so, but it's unlikely you could manage more than that without some of them starting to go bankrupt.
A This user is from outside of this forum
A This user is from outside of this forum
acosmichippo@lemmy.world

schrieb zuletzt editiert von

#6

when a critical service is not economical for more than one business to do (natural monopoly), that's when govt should be stepping in.
A 1 Antwort Letzte Antwort

14
K kowowow@lemmy.ca

I wonder if it would be a good investment for a country to have their own then down the line expand to sell the same service to others
A This user is from outside of this forum
A This user is from outside of this forum
altkey@lemmy.dbzer0.com

schrieb zuletzt editiert von

#7

It's OurFlare, comrade.
1 Antwort Letzte Antwort

8
D davriellelouna@lemmy.world

This post did not contain any content.
Y This user is from outside of this forum
Y This user is from outside of this forum
yesman@lemmy.world

schrieb zuletzt editiert von

#8

This is not about stopping bot-scrapers, it's about charging them.
S 1 Antwort Letzte Antwort

13
C concave1142@lemmy.world

Until the AI companies find a way around it. Love the idea so hopefully it causes at least 3 days of struggle for the AI crawlers.

Having said that... Can someone else put this in place so we do not have Cloudflare hosting everything where we would just be one intern away from a global outage. Please? Pretty please?
B This user is from outside of this forum
B This user is from outside of this forum
baduhai@sopuli.xyz

schrieb zuletzt editiert von

#9

Proof of work seems to be working pretty well for many websites.
1 Antwort Letzte Antwort

0
A acosmichippo@lemmy.world

when a critical service is not economical for more than one business to do (natural monopoly), that's when govt should be stepping in.
A This user is from outside of this forum
A This user is from outside of this forum
antonim@lemmy.dbzer0.com

schrieb zuletzt editiert von

#10

Which govt? I'm not comfortable with the idea of the current US govt having control over this sort of service.
A 1 Antwort Letzte Antwort

9
D davriellelouna@lemmy.world

This post did not contain any content.
C This user is from outside of this forum
C This user is from outside of this forum
chunes@lemmy.world

schrieb zuletzt editiert von

#11

I can't wait to be denied access to websites because of it. Even more than I already am, that is.
1 Antwort Letzte Antwort

8
D davriellelouna@lemmy.world

This post did not contain any content.
V This user is from outside of this forum
V This user is from outside of this forum
vermaterc@lemmy.ml

schrieb zuletzt editiert von

#12

To that end the company is developing a "Pay Per Crawl" system, which would give content creators the option to request payment from AI companies for utilising their original content.

So Cloudflare is not as much "saving the Internet", as just becoming a middleman between LLM training companies and content creators. Which I believe has a potential of being a true goldmine in the future.
S I 2 Antworten Letzte Antwort

19
V vermaterc@lemmy.ml

To that end the company is developing a "Pay Per Crawl" system, which would give content creators the option to request payment from AI companies for utilising their original content.

So Cloudflare is not as much "saving the Internet", as just becoming a middleman between LLM training companies and content creators. Which I believe has a potential of being a true goldmine in the future.
S This user is from outside of this forum
S This user is from outside of this forum
sbv@sh.itjust.works

schrieb zuletzt editiert von

#13

Corps are gonna corp.
1 Antwort Letzte Antwort

0
V vermaterc@lemmy.ml

To that end the company is developing a "Pay Per Crawl" system, which would give content creators the option to request payment from AI companies for utilising their original content.

So Cloudflare is not as much "saving the Internet", as just becoming a middleman between LLM training companies and content creators. Which I believe has a potential of being a true goldmine in the future.
I This user is from outside of this forum
I This user is from outside of this forum
interdimensionalmeme@lemmy.ml

schrieb zuletzt editiert von

#14

Can you DRM a crawl ?
B V 2 Antworten Letzte Antwort

4
Y yesman@lemmy.world

This is not about stopping bot-scrapers, it's about charging them.
S This user is from outside of this forum
S This user is from outside of this forum
spankmonkey@lemmy.world

schrieb zuletzt editiert von

#15

Hopefully people will price their content out of reach of the bot-scrapers, effectively stopping them.
1 Antwort Letzte Antwort

2
A antonim@lemmy.dbzer0.com

Which govt? I'm not comfortable with the idea of the current US govt having control over this sort of service.
A This user is from outside of this forum
A This user is from outside of this forum
acosmichippo@lemmy.world

schrieb zuletzt editiert von acosmichippo@lemmy.world

#16

are you comfortable with a single corporation having control over this sort of service? the current government is obviously not ideal but that shouldn’t stop us from regulating monopolies.
A 1 Antwort Letzte Antwort

2
C concave1142@lemmy.world

Until the AI companies find a way around it. Love the idea so hopefully it causes at least 3 days of struggle for the AI crawlers.

Having said that... Can someone else put this in place so we do not have Cloudflare hosting everything where we would just be one intern away from a global outage. Please? Pretty please?
W This user is from outside of this forum
W This user is from outside of this forum
wreckingbang@lemmy.ml

schrieb zuletzt editiert von

#17

GitHub - TecharoHQ/anubis: Weighs the soul of incoming HTTP requests to stop AI crawlers

Weighs the soul of incoming HTTP requests to stop AI crawlers - TecharoHQ/anubis

GitHub (github.com)
1 Antwort Letzte Antwort

0
C concave1142@lemmy.world

Until the AI companies find a way around it. Love the idea so hopefully it causes at least 3 days of struggle for the AI crawlers.

Having said that... Can someone else put this in place so we do not have Cloudflare hosting everything where we would just be one intern away from a global outage. Please? Pretty please?
A This user is from outside of this forum
A This user is from outside of this forum
auraithx@piefed.social

schrieb zuletzt editiert von

#18

Yeah this will have absolutely no impact to gathering training data.

I assumed it was to block ai agents crawling it during requests, which they’d be unlikely to bypass in the web ui.

But no company spending millions on training will hesitate to have an agent appear as a regular desktop user to scrape data.
B 1 Antwort Letzte Antwort

1
I interdimensionalmeme@lemmy.ml

Can you DRM a crawl ?
B This user is from outside of this forum
B This user is from outside of this forum
boonhet@sopuli.xyz

schrieb zuletzt editiert von

#19

You can if you're Cloudflare.
1 Antwort Letzte Antwort

0
A auraithx@piefed.social

Yeah this will have absolutely no impact to gathering training data.

I assumed it was to block ai agents crawling it during requests, which they’d be unlikely to bypass in the web ui.

But no company spending millions on training will hesitate to have an agent appear as a regular desktop user to scrape data.
B This user is from outside of this forum
B This user is from outside of this forum
boonhet@sopuli.xyz

schrieb zuletzt editiert von

#20

Does cloudflare still look at the agent? I thought they have more reliable data points.
A 1 Antwort Letzte Antwort

1
B boonhet@sopuli.xyz

Does cloudflare still look at the agent? I thought they have more reliable data points.
A This user is from outside of this forum
A This user is from outside of this forum
auraithx@piefed.social

schrieb zuletzt editiert von

#21

I meant an ai agent not the browser agent. All data points can be spoofed and if not they’ll pay a human to scrape before they pay for content.
B 1 Antwort Letzte Antwort

0
D davriellelouna@lemmy.world

This post did not contain any content.
I This user is from outside of this forum
I This user is from outside of this forum
isveryloud@lemmy.ca

schrieb zuletzt editiert von

#22

So... Proprietary Anubis?
1 Antwort Letzte Antwort

6
D davriellelouna@lemmy.world

This post did not contain any content.
T This user is from outside of this forum
T This user is from outside of this forum
themurphy@lemmy.ml

schrieb zuletzt editiert von

#23

I didnt read "bot blocker" wrong, thats for sure..
1 Antwort Letzte Antwort

0
A auraithx@piefed.social

I meant an ai agent not the browser agent. All data points can be spoofed and if not they’ll pay a human to scrape before they pay for content.
B This user is from outside of this forum
B This user is from outside of this forum
boonhet@sopuli.xyz

schrieb zuletzt editiert von

#24

Okay, fair enough, I thought you meant just the user agent. Trouble with having a bot make it look like an actual user is looking at the data, is that it's slow and inefficient. Trouble with paying humans to scrape the data is that it's slow and inefficient. These companies want to ingest data ridiculously fast because there's so much of it. If all else fails, they'll resort to paying the content creators. But only if it's data they really do think gives their model a competitive edge in some metric and they can't pirate it. E.g I can see them paying for scientific research they can't get from libgen, but not some rando's blog post or local news website.
1 Antwort Letzte Antwort

0

Anmelden zum Antworten

A

How to transform your Neovim to Cursor in minutes - Composio
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

4 Stimmen

1 Beiträge

0 Aufrufe

Niemand hat geantwortet
P

How not to lose your job to AI
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
16

1

9 Stimmen

16 Beiträge

25 Aufrufe

R

A nice "trick": After 4 or so responses where you can't get anywhere, start a new chat without the wrong context. Of course refine your question with whatever you have found out in the previous chat.
H

Ads on YouTube
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
47

30 Stimmen

47 Beiträge

10 Aufrufe

K

this is like a soulless manager or some ai bot trying to figure why the human brain hates terrible interruptions
M

Is Google about to destroy the web?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
65

1

193 Stimmen

65 Beiträge

76 Aufrufe

S

Or validating source, making sure it isn't AI content which usually regurgitates the same talking points. Homogenizing the entire query and removing actual information variance of personal experience.
D

Tech Company Recruiters Sidestep Trump’s Immigration Crackdown
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

1 Stimmen

1 Beiträge

10 Aufrufe

Niemand hat geantwortet
S

85K – A Melhor Opção para Quem Busca Diversão e Recompensas
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

3 Aufrufe

Niemand hat geantwortet
D

Chromium Blog: Fighting Unwanted Notifications with Machine Learning in Chrome
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

1

44 Stimmen

3 Beiträge

14 Aufrufe

V

I use it for my self hosted apps, but yeah, it's rarely useful for websites in the wild.
F

*deleted by creator*
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

0 Stimmen

1 Beiträge

7 Aufrufe

Niemand hat geantwortet