linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Millions of websites to get 'game-changing' AI bot blocker

Technology

28 Beiträge 23 Kommentatoren 22 Aufrufe

C concave1142@lemmy.world

Until the AI companies find a way around it. Love the idea so hopefully it causes at least 3 days of struggle for the AI crawlers.

Having said that... Can someone else put this in place so we do not have Cloudflare hosting everything where we would just be one intern away from a global outage. Please? Pretty please?
A This user is from outside of this forum
A This user is from outside of this forum
auraithx@piefed.social

schrieb zuletzt editiert von

#18

Yeah this will have absolutely no impact to gathering training data.

I assumed it was to block ai agents crawling it during requests, which they’d be unlikely to bypass in the web ui.

But no company spending millions on training will hesitate to have an agent appear as a regular desktop user to scrape data.
B 1 Antwort Letzte Antwort

1
I interdimensionalmeme@lemmy.ml

Can you DRM a crawl ?
B This user is from outside of this forum
B This user is from outside of this forum
boonhet@sopuli.xyz

schrieb zuletzt editiert von

#19

You can if you're Cloudflare.
1 Antwort Letzte Antwort

0
A auraithx@piefed.social

Yeah this will have absolutely no impact to gathering training data.

I assumed it was to block ai agents crawling it during requests, which they’d be unlikely to bypass in the web ui.

But no company spending millions on training will hesitate to have an agent appear as a regular desktop user to scrape data.
B This user is from outside of this forum
B This user is from outside of this forum
boonhet@sopuli.xyz

schrieb zuletzt editiert von

#20

Does cloudflare still look at the agent? I thought they have more reliable data points.
A 1 Antwort Letzte Antwort

1
B boonhet@sopuli.xyz

Does cloudflare still look at the agent? I thought they have more reliable data points.
A This user is from outside of this forum
A This user is from outside of this forum
auraithx@piefed.social

schrieb zuletzt editiert von

#21

I meant an ai agent not the browser agent. All data points can be spoofed and if not they’ll pay a human to scrape before they pay for content.
B 1 Antwort Letzte Antwort

0
D davriellelouna@lemmy.world

This post did not contain any content.
I This user is from outside of this forum
I This user is from outside of this forum
isveryloud@lemmy.ca

schrieb zuletzt editiert von

#22

So... Proprietary Anubis?
1 Antwort Letzte Antwort

6
D davriellelouna@lemmy.world

This post did not contain any content.
T This user is from outside of this forum
T This user is from outside of this forum
themurphy@lemmy.ml

schrieb zuletzt editiert von

#23

I didnt read "bot blocker" wrong, thats for sure..
1 Antwort Letzte Antwort

0
A auraithx@piefed.social

I meant an ai agent not the browser agent. All data points can be spoofed and if not they’ll pay a human to scrape before they pay for content.
B This user is from outside of this forum
B This user is from outside of this forum
boonhet@sopuli.xyz

schrieb zuletzt editiert von

#24

Okay, fair enough, I thought you meant just the user agent. Trouble with having a bot make it look like an actual user is looking at the data, is that it's slow and inefficient. Trouble with paying humans to scrape the data is that it's slow and inefficient. These companies want to ingest data ridiculously fast because there's so much of it. If all else fails, they'll resort to paying the content creators. But only if it's data they really do think gives their model a competitive edge in some metric and they can't pirate it. E.g I can see them paying for scientific research they can't get from libgen, but not some rando's blog post or local news website.
1 Antwort Letzte Antwort

0
A acosmichippo@lemmy.world

are you comfortable with a single corporation having control over this sort of service? the current government is obviously not ideal but that shouldn’t stop us from regulating monopolies.
A This user is from outside of this forum
A This user is from outside of this forum
antonim@lemmy.dbzer0.com

schrieb zuletzt editiert von

#25

are you comfortable with a single corporation having control over this sort of service?

Honestly? A tiny bit more than a single country. I have at least some miniscule control over the corporation through voting and local regulations that international corporations must follow, whereas I have absolutely no formal influence on US govt.
1 Antwort Letzte Antwort

7
I interdimensionalmeme@lemmy.ml

Can you DRM a crawl ?
V This user is from outside of this forum
V This user is from outside of this forum
vane@lemmy.world

schrieb zuletzt editiert von

#26

Oh yes DRM whole internet and wire it to Personal ID. Wet dream.
1 Antwort Letzte Antwort

2
D davriellelouna@lemmy.world

This post did not contain any content.
S This user is from outside of this forum
S This user is from outside of this forum
scrollone@feddit.it

schrieb zuletzt editiert von

#27

I wish there was an alternative (possibly European) to Cloudflare, because it's so scary to put all eggs in one basket.
1 Antwort Letzte Antwort

4
I imgonnatrythis@sh.itjust.works

I really wish the answer was a legally enforced robots.txt file that very easily allowed any web data any organization or individual user is posting to script out what the permissions are. I often use a LLM as a search and most of the time the citations are pretty decent and I use those to link out to source content.
I run a small blog and I'd love to get indexed in a LLM, not blocked, as long as I was assured a reference link for any content used and had some legal recourse if I found my data was being misused.
I don't love the answer being another mega corporation posing as a white knight looking to skim some money off of the "loophole" that is AI copyright infringement.
D This user is from outside of this forum
D This user is from outside of this forum
drmoose@lemmy.world

schrieb zuletzt editiert von

#28

How would you legally enforce robots.txt? It's not a legally sound system.
1 Antwort Letzte Antwort

0

Anmelden zum Antworten

T

DeepSeek accused of powering China’s military and mining US user data
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
28

1

139 Stimmen

28 Beiträge

84 Aufrufe

D

Lmao it hasn't even been a year under Trump. Calm your titties
P

Defense Department signs OpenAI for $200 million 'frontier AI' pilot project
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

34 Stimmen

3 Beiträge

13 Aufrufe

L

deleted by creator
P

Salt Lake City, plans to implement AI-assisted 911 call triaging to handle ~30% of about 450K non-emergency calls per year
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
1

1

2 Stimmen

1 Beiträge

5 Aufrufe

Niemand hat geantwortet
D

AI and misinformation
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

20 Stimmen

3 Beiträge

8 Aufrufe

D

Don’t lose hope, just pretend to with sarcasm. Or if you are feeling down it could work the other way too. https://aibusiness.com/nlp/sarcasm-is-really-really-really-easy-for-ai-to-handle#close-modal
D

A Texas Cop Searched License Plate Cameras Nationwide for a Woman Who Got an Abortion
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
10

74 Stimmen

10 Beiträge

26 Aufrufe

C

Time to start chopping down flock cameras.
D

Trump Media & Technology Group, the company owned by the President, said Tuesday that it would raise $2.5 billion to invest in Bitcoin
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
15

1

50 Stimmen

15 Beiträge

19 Aufrufe

A

it's an insecurity.
D

Bookmark keywords, again (Firefox)
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
3

4 Stimmen

3 Beiträge

8 Aufrufe

B

This is terrible news. I also have a keyboard-centric workflow and also make heavy use of keyword bookmarks. I too use custom bookmarklets containing JavaScript that I can invoke with a few key strokes for multiple uses including: 1: Auto-expanding all nested Reddit comments on posts with many comments on desktop. 2: Downloading videos from certain web sites. 3: Playing a play-by-forum online board game. 4: Helping expand and aid in downloading images from a certain host. 5: Sending X (Twitter) URLs in the browser bar to Nitter or TWStalker. And all these without touching the mouse! It's really disappointing to read that Firefox could be taking so much capability in the browser away.
M

Pocket shutting down
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben Technology technology
2

2 Stimmen

2 Beiträge

6 Aufrufe

B

Can anyone recommend a good alternative? I still use it to bookmark most wanted sites.