Codeberg: army of AI crawlers are extremely slowing us; AI crawlers learned how to solve the Anubis challenges.
-
Lol okay. Have fun LARPing, comrade.
Capitalism isn’t your friend and will never serve you
-
client-server model. I want sharing model. Like with Briar
Guess what
Briar itself, and every pure P2P decentralized network where all nodes are identical... are built on Internet Sockets which inherently require one party ("server") to start listening on a port, and another party ("client") to start the conversation.
Briar uses TCP/IP, but it uses Tor routing, which is IMO a smart thing to do
I'm talking about Briar used over BT.
-
Crawlers are expensive and annoying to run, not to mention unreliable and produce low quality data.
If there really were a site dump available, I don't see why it would make sense to crawl the website, except to spot check the dump is actually complete.
This used to be standard and it came with open API access for all before the silicon valley royals put the screws on everyoneI wish I was still capable of the same belief in the goodness of others.
-
cross-posted from: https://programming.dev/post/35852706
I use Anubis on my personal website, not because I think anything I’ve written is important enough that companies would want to scrape it, but as a “fuck you” to those companies regardless
That the bots are learning to get around it is disheartening, Anubis was a pain to setup and get running
-
I'm talking about Briar used over BT.
Even
AF_BLUETOOTH
sockets are.... sockets, where one machine ("server') opens to listen, and the other ("client") initiates the stream -
Even
AF_BLUETOOTH
sockets are.... sockets, where one machine ("server') opens to listen, and the other ("client") initiates the streamSockets are an operating system abstraction and have nothing to do with this conversation.
-
The internet came together to define the robots file standard, it could just as easily come with a standard API for database dumps. But decided on war since the 2023 API wars and now we're going to see all the small websites die while facebook gets even more powerful.
Well there you have it. Although I still feel weird that it's somehow "the internet" that's supposed to solve a problem that's fully caused AI companies and their web crawlers.
If a crawler keeps spamming and breaking a site I see it as nothing short of a DOS attack.Not to mention that
robots.txt
is completely voluntary and, as far as I know, mostly ignored by these companies. So then what makes you think that any them are acting in good faith?To me that is the core issue and why your position feels so outlandish. It's like having a bully at school that constantly takes your lunch and your solution being: "Just bring them a lunch as well, maybe they'll stop."
-
Tech bros just actively making the internet worse for everyone.
I mean, tech bros of the past invented the internet
-
Okay what about...what about uhhh...
Static site builders that render the whole page out as an image map, making it visible for humans but useless for crawlersAI these days reads text from images better than humans can
-
Capitalism isn’t your friend and will never serve you
If only you understood anything you were talking about. Oh well!
-
Well there you have it. Although I still feel weird that it's somehow "the internet" that's supposed to solve a problem that's fully caused AI companies and their web crawlers.
If a crawler keeps spamming and breaking a site I see it as nothing short of a DOS attack.Not to mention that
robots.txt
is completely voluntary and, as far as I know, mostly ignored by these companies. So then what makes you think that any them are acting in good faith?To me that is the core issue and why your position feels so outlandish. It's like having a bully at school that constantly takes your lunch and your solution being: "Just bring them a lunch as well, maybe they'll stop."
The solution is breaking intellectual property and making sharing public data easy and efficient. A top-down imposition DESIGNED to crush the giants back down to the level playing field of the small players into a system where cooperation empower the small and place the burdens on the big with the understanding that all public data is "our" data and nobody, including its custodian should get between US and IT. Something designed by actually competent and clever politicians who will anticipate and counter all the dirty tricks big tech would try to regain the upper hand. I want big tech permanently losing on a field designed to disadvantage anything that accumulates power.
-
I mean, tech bros of the past invented the internet
Those are not the tech bros. The tech bros are the ones who move fast and break things. The internet was built by engineers and developers
-
I wasn't being totally serious, but also, I do think that while accessibility concerns come from a good place, there is some practical limitation that must be accepted when building fringe and counter-cultural things. Like, my hidden rebel base can't have a wheelchair accessible ramp at the entrance, because then my base isn't hidden anymore. It sucks that some solutions can't work for everyone, but if we just throw them out because it won't work for 5% of people, we end up with nothing. I'd rather have a solution that works for 95% of people than no solution at all. I'm not saying that people who use screen readers are second-class citizens. If crawlers were vision-based then I might suggest matching text to background colors so that only screen readers work to understand the site. Because something that works for 5% of people is also better than no solution at all. We need to tolerate having imperfect first attempts and understand that more sophisticated infrastructure comes later.
But yes my image map idea is pretty much a joke nonetheless
Don't worry, we were never going to make anything 100% accessible anyway, that would be impossible.
-
I mean, tech bros of the past invented the internet
Nah, that was DARPA
-
I mean, tech bros of the past invented the internet
Those were tech nerds. "Tech bros" are jabronis who see the tech sector as a way to increase the value of the money their daddies gave them.
-
If only you understood anything you were talking about. Oh well!
Yeah it sure is a shame lol