Internet Archive played crucial role in tracking shady CDC data removals

ForgottenFlux@lemmy.world · 20 hours ago

Internet Archive played crucial role in tracking shady CDC data removals

CitricBase@lemmy.world · 18 hours ago

As of five years ago, 70 petabytes: https://blog.adafruit.com/2020/12/01/donate-to-the-internet-archive-digital-library-of-free-borrowable-books-movies-music-wayback-machine-internetarchive/

in 2012 it was 10 petabytes. Now, it’s probably well over 100 petabytes. I think it well beyond the scope of torrents by now.

dan@upvote.au · 6 hours ago

This comment from 8 months ago says 152PB: https://www.reddit.com/r/DataHoarder/comments/1cu79ke/the_archiveteam_has_a_cost_shameboard_of_the_top/l4om4m6/

TimeSquirrel@kbin.melroy.org · 14 hours ago

Is there a way to distribute it so everyone just has parts of it? Aren’t there p2p cloud storage solutions that exist?

9point6@lemmy.world · edit-2 7 hours ago

The problem is you’d need to split it down to an amount that people would be happy hosting and then host it multiple times in case any node goes offline.

Another comment in the thread says it’s likely over 100PB today (100,000 terabytes). I’d say 4 copies (spread over different time zones) is a relatively minimal level of redundancy (people may host on machines that aren’t powered all the time), and I reckon you’d get a network with the most participants, whilst still getting enough storage, at around the 150gb per node mark.

That comes to nearly 3 million participants needed just to cover today’s archive, new people will obviously need to join every day. Also given I imagine it would need to be open to all, the redundancy level could do with increasing to avoid malicious actors with a lot of resources taking on a lot of the network and forcing it all offline at once in an effort to cause data loss

Nothing here is insurmountable, but also not remotely easy

BassTurd@lemmy.world · 18 hours ago

That’s a bit more than my home server can handle. I could maybe take some CDC data, but definitely not the full shebang. It would be neat if someone could segment the data so we could save some more critical things.

xektop@lemmy.world · 10 hours ago

A couple years ago I read that Filecoin has teamed up with the internet archive to synchronize the data on the Blockchain. I’m not sure how far they are yet, but it’s something that could work if it doesn’t turn out to be just crypto hype in the end.