Reddit will block the Internet Archive
-
Good plan. Keep locking down your big tech platforms, and we'll all be over here letting folks know where they can find freedom.
Or... let them stay on Reddit. I like lemmy much better, and it's possibly due to the people that are not present and the lack of commercial interest.
-
This post did not contain any content.
Reddit will block the Internet Archive
Reddit caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to limit the Internet Archive from indexing some data.
The Verge (www.theverge.com)
what's a reddit?
-
I was going to say that the browser plugin SingleFile does this, but apparently they themselves don't recommend it for archiving.
Unfortunately, it'll be more than that, as that'll be saving the plaintext files transferred internal to the TLS connection. The information that would need to be saved will normally just be thrown out, as it'll be the TLS connection itself.
On second thought, though, I don't think that it'd be viable, since the way that something like this normally works is to just use (slow) public key encryption to transfer a symmetric session key and to then use (fast) symmetric encryption on the bulk data, and once you have a copy of the session key, you could forge whatever you want with it. This would only work if you were using asymmetric encryption to encrypt the data in the connection.
kagis
What is a session key? Session keys and TLS handshakes
The TLS (historically known as "SSL") protocol uses both asymmetric/public key and symmetric cryptography, and new keys for symmetric encryption have to be generated for each communication session. Such keys are called "session keys."
Yeah. Oh, well. It was a happy thought for a moment.
-
This post did not contain any content.
Reddit will block the Internet Archive
Reddit caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to limit the Internet Archive from indexing some data.
The Verge (www.theverge.com)
Nice of them to protect their (users') content from AI scrapping. So that they can charge AI companies for it instead.
-
This post did not contain any content.
Reddit will block the Internet Archive
Reddit caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to limit the Internet Archive from indexing some data.
The Verge (www.theverge.com)
When reddit has mutated a few more times. They start erasing stuff themselves. It will be lost to time and that fills me with hope.
-
Or... let them stay on Reddit. I like lemmy much better, and it's possibly due to the people that are not present and the lack of commercial interest.
No harm in that. To each their own.
Everyone gets to decide at least.
-
Good plan. Keep locking down your big tech platforms, and we'll all be over here letting folks know where they can find freedom.
Careful. Lemmy is too small to draw the attention of sophisticated, persistent abuse. As a company, Reddit has struggled with revenue and we've all seen those struggles quite publicly. Lemmy instances with those same challenges would probably just fold and close up.
Federated networks give you freedom but the potential for abuse is proportional to that freedom while at the same time, federation is far more expensive taken as a whole.
-
This post did not contain any content.
Reddit will block the Internet Archive
Reddit caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to limit the Internet Archive from indexing some data.
The Verge (www.theverge.com)
They can keep their shit for themselves, stopped caring a long time ago.
-
Nice of them to protect their (users') content from AI scrapping. So that they can charge AI companies for it instead.
They aren’t doing that. They are protecting content from being scraped for free. Reddit is perfectly happy to charge for AI access to user-generated content.
-
that history forgets this period
and thus it repeats
don't worry, we easily repeat what we "learned" anyway
-
And you think reddit actually deletes it? Risk data loss? All that valuable data? No way. They might shadow delete it, but it's there forever.
both of you are correct because you are speaking of different things
-
Technologically no. Reddit sends out the data to 10s of millions of users as part of their normal operations. They need to try to block those who collect that data for the IA. Reddit has the very short end of the stick.
The problem is that evading such counter-measures may be criminal in the US. Obviously, EU laws are much harsher.
Slightly related, can you explain how (a few times for me) an archived page I tried to revisit got erased?
-
what's a reddit?
You use it too scratch your butt I think.
-
Good plan. Keep locking down your big tech platforms, and we'll all be over here letting folks know where they can find freedom.
'freedom' as long as the mod agrees with you.
-
Or... let them stay on Reddit. I like lemmy much better, and it's possibly due to the people that are not present and the lack of commercial interest.
Just make your own invite-only server if you're so worried about it. Digital freedom should be for everyone, not just a few antisocial nerds.
-
This post did not contain any content.
Reddit will block the Internet Archive
Reddit caught AI companies scraping its data from the Internet Archive’s Wayback Machine, so it’s going to limit the Internet Archive from indexing some data.
The Verge (www.theverge.com)
So reddit will become even less valuable
-
Just make your own invite-only server if you're so worried about it. Digital freedom should be for everyone, not just a few antisocial nerds.
I'm not worried about anything.
-
Technologically no. Reddit sends out the data to 10s of millions of users as part of their normal operations. They need to try to block those who collect that data for the IA. Reddit has the very short end of the stick.
The problem is that evading such counter-measures may be criminal in the US. Obviously, EU laws are much harsher.
Not to mention all of Asia, South America, Africa...
-
Careful. Lemmy is too small to draw the attention of sophisticated, persistent abuse. As a company, Reddit has struggled with revenue and we've all seen those struggles quite publicly. Lemmy instances with those same challenges would probably just fold and close up.
Federated networks give you freedom but the potential for abuse is proportional to that freedom while at the same time, federation is far more expensive taken as a whole.
I'm sure it would persist even after an event of malicious activity. It may just turn out like email with servers needing to be added to an allowlist at worst and more moderation. I think scalability might be the limiting factor at some point though and as a result we could end up with several disconnected islands of server clusters instead of globally meshed servers.
-
It’s another move to protect against AI scraping that isn't paying them for access.
Weren't Reddit complaining a couple of years ago that too many AI bots crawls were stressing their servers.
Doesn't the internet archive relieve that stress?
-
-
First 3D printed titanium rocket fuel tank can handle 330 bar pressure under -196°C | by Korea Institute of Industrial Technology
Technology1
-
Low Carbon Hydrogen Market Forecasted to Expand at 18.44% CAGR Amid Clean Energy Transition
Technology2
-
-
-
$440 Charge For A Wheel Scuff Raises Questions About Hertz's AI Rental Car Damage Scanner
Technology1
-
Child Welfare Experts Horrified by Mattel's Plans to Add ChatGPT to Toys After Mental Health Concerns for Adult Users
Technology1
-