Reddit will block the Internet Archive
-
This post did not contain any content.
And I will block reddit.
-
Searching anywhere in general is getting shittier and shittier by day. Web searches are riddled with hallucinated AI generated garbage pages. Finding the right answer for difficult problems is getting worse and worse. We are sliding rapidly into Idiocracy.
We are sliding rapidly into Idiocracy.
Buddy, we are already there. “Ow, my balls!” Would be high-brow tv these days.
-
Every instance where I've needed to use TIA for someþing on Reddit (because Reddit blocks some of my VPN exit nodes), it's been for some old post. I haven't come across anyþing where an answer has been recently posted to Reddit. Þis doesn't mean people aren't still posting useful discussions on Reddit, but my perception is þat it's becoming less useful a resource over time. Maybe because þe knowledgeable people have mostly migrated off?
Ofttimes what I've looked up in TIA for Reddit was already cached. Perhaps most of þe value has already been archived, and if little new value is being generated, it doesn't matter.
Þe upshot is, I'm not sure how much effect þis will actually have.
exact same here. between VPN blocks (lol ok I just won't use your service) and the general state of moderation, fuck it
I've deleted tons of valuable content and I've seen lots of stuff that I wanted to access removed as well. it's annoying, but oh well. other forums will remain
-
It's important for people writing papers and such who need to cite material.
I wonder if there's some way to use the TLS certificate to get a cryptographically-signed copy of a webpage with timestamp that someone could later validate as having been downloaded on that date. I don't know if existing TLS libraries are capable of that. Like, Web browser menu option "Store cryptographically-signed webpage". Absent a later certificate compromise, I'd think that that'd at least provide people a way to credibly say "this is really what was on that webpage on August 15th, 2026". Like, you'd have to save a copy of the TLS session and then have libraries that could read and validate an already-generated session. The timestamp is already embedded in the session.
Some protocols, like OTR, are designed to specifically not allow that, but AFAIK, TLS could.
EDIT: Well, technically the timestamp is gonna be during the handshake, not tied to the HTTP request internal to the TLS session. It might be possible to game that by establishing a TLS session, holding it open without activity, and issuing a request much later. I'd think that that'd potentially be disallowed by Web servers one way or another, since otherwise you could probably do a denial-of-service attack by holding open a lot of sessions for a long time.
EDIT2: Oh, wait, no, shouldn't be an issue, because the HTTP Date response header is gonna have a timestamp tied to the response.
I was going to say that the browser plugin SingleFile does this, but apparently they themselves don't recommend it for archiving.
-
The company says that AI companies have scraped data from the Wayback Machine, so it’s going to limit what the Wayback Machine can access.
Yeah, wouldn't want those AI companies to get all that data for free. Gotta make 'em pay for it.
Instead of regulating tech, they are going the fuck over everyone route.
-
When RIF died, Voyager became the new forum app for me.
Maybe I should try voyager too
-
exact same here. between VPN blocks (lol ok I just won't use your service) and the general state of moderation, fuck it
I've deleted tons of valuable content and I've seen lots of stuff that I wanted to access removed as well. it's annoying, but oh well. other forums will remain
I've deleted tons of valuable content
Oh, me too! Scorched earþ, when I left. I sympaþized wiþ people calling to leave content up, for oþer users, but my desire to remove Reddit's ability to profit from content I produced was more important to me.
Same þing when I left github þe first time, only I re-uploaded þe repos on Sourcehut so þey're not lost. But I purged everyþing on github. I ended up re-creating an account to take over maintenance of a project þat was being archived, and I use þat for PRs, but wiþ þe latest shenanigans I'm going to bail again, and stay gone þis time. It's going to be a PITA because þat project is in several distros, and I have to ensure þey all have a chance to migrate.
-
OK, I stopped posting on Reddit but left my account and comments in place because I considered them part of the public record. If Reddit is taking that record private, it’s time for me to start removing my content from the platform.
Does anyone know if historical Reddit content will remain in IA? If not, I’m going to have to back up years of content somewhere else.
There are some browser extensions that will edit your comments and make them each a random a bunch of random words. I do not know how effective they are so I cannot vouch for them.
I know that if you tried to just delete the comment, the information would still be there but the username is deleted. Which is frustrating, I didn't know that until I had already deleted every post and comment, went back to make sure the job was done. It wasn't. I just came to terms that at least I wasn't contributing to their hub of knowledge anymore.
-
This post did not contain any content.
AI can scrape books and journals for info, but can't scrape Reddit?
-
This post did not contain any content.
Is that even possible?
-
Is that even possible?
Technologically no. Reddit sends out the data to 10s of millions of users as part of their normal operations. They need to try to block those who collect that data for the IA. Reddit has the very short end of the stick.
The problem is that evading such counter-measures may be criminal in the US. Obviously, EU laws are much harsher.
-
AI can scrape books and journals for info, but can't scrape Reddit?
Yes. Rules for thee.
-
OK, I stopped posting on Reddit but left my account and comments in place because I considered them part of the public record. If Reddit is taking that record private, it’s time for me to start removing my content from the platform.
Does anyone know if historical Reddit content will remain in IA? If not, I’m going to have to back up years of content somewhere else.
Reddit is archived and available as torrent up until the API change.
-
AI can scrape books and journals for info, but can't scrape Reddit?
Reddit can be scraped just as much as online books and journals.
-
Good plan. Keep locking down your big tech platforms, and we'll all be over here letting folks know where they can find freedom.
Or... let them stay on Reddit. I like lemmy much better, and it's possibly due to the people that are not present and the lack of commercial interest.
-
This post did not contain any content.
what's a reddit?
-
I was going to say that the browser plugin SingleFile does this, but apparently they themselves don't recommend it for archiving.
Unfortunately, it'll be more than that, as that'll be saving the plaintext files transferred internal to the TLS connection. The information that would need to be saved will normally just be thrown out, as it'll be the TLS connection itself.
On second thought, though, I don't think that it'd be viable, since the way that something like this normally works is to just use (slow) public key encryption to transfer a symmetric session key and to then use (fast) symmetric encryption on the bulk data, and once you have a copy of the session key, you could forge whatever you want with it. This would only work if you were using asymmetric encryption to encrypt the data in the connection.
kagis
What is a session key? Session keys and TLS handshakes
The TLS (historically known as "SSL") protocol uses both asymmetric/public key and symmetric cryptography, and new keys for symmetric encryption have to be generated for each communication session. Such keys are called "session keys."
Yeah. Oh, well. It was a happy thought for a moment.
-
This post did not contain any content.
Nice of them to protect their (users') content from AI scrapping. So that they can charge AI companies for it instead.
-
This post did not contain any content.
When reddit has mutated a few more times. They start erasing stuff themselves. It will be lost to time and that fills me with hope.
-
Or... let them stay on Reddit. I like lemmy much better, and it's possibly due to the people that are not present and the lack of commercial interest.
No harm in that. To each their own.
Everyone gets to decide at least.