A few different options with how to host the archives.

Here’s what /r/datahoarder is doing with redarc

We could import it here, put it in a seperate community on this server, host it with redarc on a subdomain, it’s pretty much whatever.

I’ll put a survey up once I finish that server again for a vote, thought a discussion would be good to have prior to that going up.

Thoughts??

    • ProfessionalHandJob@lemmy.beyondcombustion.netOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      1 year ago

      Either a separate archive community or just right into vaporents on BeyondCombustion + a bot to scrape new posts from the Reddit RSS feed and repost them here (but under a bot account).

      I think it would be more engaging for lemmy at first to have posts pulled in at the beginning at least, until conversation takes place more naturally

      Absolutely going to host it 💯. Just trying to decide exactly what format…. If it works well I’d be willing to do the same for other subreddits, I’ll be documenting the steps but having PostgreSQL access on the lemmy server is a non starter for most subreddits/people/mods; unless they have their own infrastructure too.

        • ProfessionalHandJob@lemmy.beyondcombustion.netOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          1 year ago

          Agree they will at some point nuke that subreddit.

          Pretty much everything has been archived that could have been, since the API goes dark in ~2 days.

          There’s details of what apps were used, where to download the archives directly, links to torrents and such on a post from /r/DataHoarder which I’m collecting links/text/guides from over in our Gitea instance as well as importing projects used for this effort.

          So far, there’s like ~5-6TB of archives I’ve downloaded through the links in that post, others on /r/DataHoarder, the-eye.eu, etc.

          They go back allll the way to 2005… It’s just text though, no media. Unless the media was a link to Imgur or YouTube or something, then the links are in the posts.

          There’s a couple bots/scripts that will repost new stuff moving forward from RSS feeds.

          To inject directly from those backups into a Lemmy PostgreSQL database, I was using this tool, RedditLemmyImporter. Which, actually looks like it was made by the lemmy developer dessalines moving the r/GenZhou subreddit into lemmy.ml/lemmygrad.ml originally but forked very early to try to obscure that… so I do feel a bit dirty about that, and more so about Lemmy in general from some of the things I’ve seen from dessalines themselves.

          TBH… I think it’s important to make a Fork of Lemmy itself, and to really really comb through this code base. Not sure if this is a long term solution if dessalines is still the head dev, one of the reasons I’m setting up other forums on this server as well. Lemmy and non-corp/federated social media is good, but I’m not liking the stewards of the lemmy code base the more I read their direct words/actions/history/and code like this importer.

        • ProfessionalHandJob@lemmy.beyondcombustion.netOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          1 year ago

          Also, if there are communities out there that you’re interested in helping do this they have to either

          1. have their own server

          or

          1. get direct access to the PostgreSQL database on whatever new server they want to setup a community in.

          I’m sure there are other ways to import it, like scripting something that literally re-posted everything through API calls to Lemmy or ugh clicking through the web GUI lmao. Without that database access I’d consider it too much work/hassle to be practical.