Reddit will block the Internet Archive
-
This post did not contain any content.
Reddit will block the Internet Archive
Reddit caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to limit the Internet Archive from indexing some data.
The Verge (www.theverge.com)
Is that even possible?
-
Is that even possible?
Technologically no. Reddit sends out the data to 10s of millions of users as part of their normal operations. They need to try to block those who collect that data for the IA. Reddit has the very short end of the stick.
The problem is that evading such counter-measures may be criminal in the US. Obviously, EU laws are much harsher.
-
AI can scrape books and journals for info, but can't scrape Reddit?
Yes. Rules for thee.
-
OK, I stopped posting on Reddit but left my account and comments in place because I considered them part of the public record. If Reddit is taking that record private, it’s time for me to start removing my content from the platform.
Does anyone know if historical Reddit content will remain in IA? If not, I’m going to have to back up years of content somewhere else.
Reddit is archived and available as torrent up until the API change.
-
AI can scrape books and journals for info, but can't scrape Reddit?
Reddit can be scraped just as much as online books and journals.
-
Good plan. Keep locking down your big tech platforms, and we'll all be over here letting folks know where they can find freedom.
Or... let them stay on Reddit. I like lemmy much better, and it's possibly due to the people that are not present and the lack of commercial interest.
-
This post did not contain any content.
Reddit will block the Internet Archive
Reddit caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to limit the Internet Archive from indexing some data.
The Verge (www.theverge.com)
what's a reddit?
-
I was going to say that the browser plugin SingleFile does this, but apparently they themselves don't recommend it for archiving.
Unfortunately, it'll be more than that, as that'll be saving the plaintext files transferred internal to the TLS connection. The information that would need to be saved will normally just be thrown out, as it'll be the TLS connection itself.
On second thought, though, I don't think that it'd be viable, since the way that something like this normally works is to just use (slow) public key encryption to transfer a symmetric session key and to then use (fast) symmetric encryption on the bulk data, and once you have a copy of the session key, you could forge whatever you want with it. This would only work if you were using asymmetric encryption to encrypt the data in the connection.
kagis
What is a session key? Session keys and TLS handshakes
The TLS (historically known as "SSL") protocol uses both asymmetric/public key and symmetric cryptography, and new keys for symmetric encryption have to be generated for each communication session. Such keys are called "session keys."
Yeah. Oh, well. It was a happy thought for a moment.
-
This post did not contain any content.
Reddit will block the Internet Archive
Reddit caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to limit the Internet Archive from indexing some data.
The Verge (www.theverge.com)
Nice of them to protect their (users') content from AI scrapping. So that they can charge AI companies for it instead.
-
This post did not contain any content.
Reddit will block the Internet Archive
Reddit caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to limit the Internet Archive from indexing some data.
The Verge (www.theverge.com)
When reddit has mutated a few more times. They start erasing stuff themselves. It will be lost to time and that fills me with hope.
-
Or... let them stay on Reddit. I like lemmy much better, and it's possibly due to the people that are not present and the lack of commercial interest.
No harm in that. To each their own.
Everyone gets to decide at least.
-
Good plan. Keep locking down your big tech platforms, and we'll all be over here letting folks know where they can find freedom.
Careful. Lemmy is too small to draw the attention of sophisticated, persistent abuse. As a company, Reddit has struggled with revenue and we've all seen those struggles quite publicly. Lemmy instances with those same challenges would probably just fold and close up.
Federated networks give you freedom but the potential for abuse is proportional to that freedom while at the same time, federation is far more expensive taken as a whole.
-
This post did not contain any content.
Reddit will block the Internet Archive
Reddit caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to limit the Internet Archive from indexing some data.
The Verge (www.theverge.com)
They can keep their shit for themselves, stopped caring a long time ago.
-
Nice of them to protect their (users') content from AI scrapping. So that they can charge AI companies for it instead.
They aren’t doing that. They are protecting content from being scraped for free. Reddit is perfectly happy to charge for AI access to user-generated content.
-
that history forgets this period
and thus it repeats
don't worry, we easily repeat what we "learned" anyway
-
And you think reddit actually deletes it? Risk data loss? All that valuable data? No way. They might shadow delete it, but it's there forever.
both of you are correct because you are speaking of different things
-
Technologically no. Reddit sends out the data to 10s of millions of users as part of their normal operations. They need to try to block those who collect that data for the IA. Reddit has the very short end of the stick.
The problem is that evading such counter-measures may be criminal in the US. Obviously, EU laws are much harsher.
Slightly related, can you explain how (a few times for me) an archived page I tried to revisit got erased?
-
what's a reddit?
You use it too scratch your butt I think.
-
Good plan. Keep locking down your big tech platforms, and we'll all be over here letting folks know where they can find freedom.
'freedom' as long as the mod agrees with you.
-
Or... let them stay on Reddit. I like lemmy much better, and it's possibly due to the people that are not present and the lack of commercial interest.
Just make your own invite-only server if you're so worried about it. Digital freedom should be for everyone, not just a few antisocial nerds.