(attempt to cross-post from /c/programming )

Idea: Scrape all the posts from a subreddit as they’re being made, and “archive” them on a lemmy instance, making it very clear it’s being rehosted, and linking back to the original. It would probably have to be a “closed” lemmy instance specifically for this purpose. The tool would run for multiple subreddits, allowing Lemmy users to still be updated about and discuss any potential content that gets left behind.

Thoughts? It’s probably iffy copyright-wise, but I think I can square my conscience with it.


Update: as per the feedback, I have acquired a separate instance, and started coding. Just tonight I managed to clone some posts from a subreddit to it. - I’m intentionally being vague because I will probably wipe and reset the communities on there a couple of times, and that messes up federation.

The goal is to have all the communities be read only for non-mods (and the only mod will be the admin and bots), but to also have a separate request community where anyone can request subreddits to be cloned. I’ll keep updating this post here - still figuring all of this out as I go along :)

  • borari@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 year ago

    Reddit implemented rate limits on page loads to combat the inevitable web scraping

    This whole time I was wondering how the API changes made any sense when anyone disgruntled about it could just turn to scraping, putting drastically more load on Reddit’s infrastructure. It makes me feel a bit better that they aren’t that clueless.