Hi everybody!! I have the archive of everything ever posted on r/GenZedong, and I would like to eventually get it online for when the subreddit eventually gets banned (inevitable lol).
would people use it? do people want it? what does everybody think?
uwu
(actual photo of me at the genzedong archive)
Hey, thanks for doing this! It was so long ago when I asked you to do it during the chaotic times.
If you want to host it as a file to download, you could compress the whole thing as a 7Z file, using 7-Zip or PeaZip, to make it compact. FreeARC helps even more with shaving down total file size.
Or do you want to reimage it as a Lemmygrad archive community? For that I would suggest performing bulk compression on images and videos to save bandwidth.
probably going to do the lemmy archive community idea. i could provide it as a download, but it would be massive lol, it’s a lot of content.
I for one would be interested in a full copy. I could throw it up as a torrent on a seedbox for a while as if anyone else wants it as well.
Same as @knfrmity@lemmygrad.ml , I would like to know the 7Z compressed sizes for text only posts, images and videos. Might want to grab text only because there is a lot of nice content from various points in time.
Zstandard for speed or Brotli for compression ratio would probably work better.
Do Zstandard and Brotli have higher compression than 7Z LZMA2, or FreeARC’s ARC format? The latter 2 top efficiency charts, from my archival compression knowledge of the past 10-12 years. Once you encode/package anything, the bandwidth and storage savings are harvested forever.
I did a few tests. I tried compressing a config file with a bunch of algorithms at their highest compression levels. This is what I got:
2880 traefik.nomad 1088 traefik.nomad.gz 1078 traefik.nomad.zst 1100 traefik.nomad.xz 1219 traefik.nomad.7z 918 traefik.nomad.brotli
traefik.nomad
is the original. As you can see, Zstandard and Brotli have the best compression ratios. Zstandard is also insanely fast, capable of around 500 MB/sec/core.This is not the only time I’ve tested this. I’ve done it with videos, images, random text files, documents, etc., and Brotli always wins in compression ratio, while Zstandard always wins in speed.
Configuration text files are not the only type of files. You could use PPMd in 7-Zip for it. You need to use a variety of files to benchmark. Which and what were the versions of the compression binaries you used? Did you try FreeARC on it?
I saw this: https://peazip.github.io/fast-compression-benchmark-brotli-zstandard.html
https://patrickgawron.com/t3/2018/01/30/multi-threaded-7-zip-with-zstandard-brotli-lz4-lz5-and-lizard-compression-support/
These do not have FreeARC which has even more compression but at little extra time cost. After this, we have PAQ and ZPAQ at various levels, which are impractical for both compression and decompression time efficiency.
I know. As I’ve mentioned, I have used many types of files in the past, and the general trend is that Brotli has the best compression ratios while Zstandard has the best speeds. I just used a config file as it is very compressible.
Running on Arch Linux
I did not try FreeARC as it is abandoned and I can’t seem to figure out how to use its command-line utility.
Also, worth mentioning, is that 7zip and FreeARC are both archiving formats that use modified versions of existing compression algorithms, so I wouldn’t really compare them to the other algorithms on the list.
FreeARC has a GUI for Windows that is runnable under Wine with no performance penalty. You can learn commandline if you wish to avoid Wine usage.
p7zip 17.04 is very old now. There is 7-Zip 22.01 available since 3 months. Although it should not affect the benchmarking too much, it is something to note. The binaries of 7-Zip 22.01 should be available in PeaZip’s newest version as well.
7Z and ARC are not merely modified versions of existing algorithms, but are traditional compression formats that are geared to be more flexible than Google and Facebook created TAR based counterparts here. This is not a correct way to dismiss comparison, when some are superior in application than others.
Non-traditional compressors are made as an attempt to fine tune and allow Tar-Gzip or Tar-Bzip2 as distribution formats, and to monopolise the archival compression space. They fail at it because of immaturity and worse optimisations.
You can see in the PeaZip benchmark article above how 7Z is far superior even with just default compression settings. This simply changed to unbeatable levels in terms of ratio with higher compression levels like Maximum or Ultra.
Also, abandonment of FreeARC does not mean it is unsuitable for benchmarking or real world usage.