If you wanna write code to do this … I’d say skip the bot, write a gateway instead.
Back in the early days of email, there were lots of different email systems, not just the SMTP Internet email we use today. There was UUCP email with “bang paths”, where your email address specified a list of servers that a message could be passed through to get to you. There were other networks like FidoNet and WWIVnet, that could send email to Internet email addresses through special “gateway” servers.
A gateway receives messages using one protocol or service, and retransmits or makes them available on another protocol or service.
For a little while in 1992, I had access to read Usenet posts only through a gateway that exported Usenet posts onto the Gopher system.
A gateway between Reddit and Lemmy would appear to Reddit as a web browser, scraping posts and comments; while appearing to Lemmy as a Lemmy instance that users could subscribe to, making each subreddit it scrapes available as a Lemmy community.
So a Lemmy user could subscribe to, say, !askreddit@reddittolemmy.com and see a fresh view of AskReddit. The server at reddittolemmy.com would not be a standard Lemmy server with users, but rather a custom gateway server that fetches data from Reddit and makes it available in the form of a Lemmy community.
(If Reddit were not being an asshole, a gateway could be an API client. But Reddit is being an asshole, so a gateway should probably be written as a scraper that accesses Reddit as if it were a normal user using a desktop Web browser.)
I don’t particularly think the whole of Reddit needs to be scraped though. I could be happy with only scraping posts that pass a certain thresh hold of votes against the subreddits subscriber count and maybe getting those crossposted to the Lemmy equivalent communities that want to opt in to such a service. This would be especially useful for World News and the more niche subreddits that don’t yet have a big enough userbase here
If you wanna write code to do this … I’d say skip the bot, write a gateway instead.
Back in the early days of email, there were lots of different email systems, not just the SMTP Internet email we use today. There was UUCP email with “bang paths”, where your email address specified a list of servers that a message could be passed through to get to you. There were other networks like FidoNet and WWIVnet, that could send email to Internet email addresses through special “gateway” servers.
A gateway receives messages using one protocol or service, and retransmits or makes them available on another protocol or service.
For a little while in 1992, I had access to read Usenet posts only through a gateway that exported Usenet posts onto the Gopher system.
A gateway between Reddit and Lemmy would appear to Reddit as a web browser, scraping posts and comments; while appearing to Lemmy as a Lemmy instance that users could subscribe to, making each subreddit it scrapes available as a Lemmy community.
So a Lemmy user could subscribe to, say, !askreddit@reddittolemmy.com and see a fresh view of AskReddit. The server at reddittolemmy.com would not be a standard Lemmy server with users, but rather a custom gateway server that fetches data from Reddit and makes it available in the form of a Lemmy community.
(If Reddit were not being an asshole, a gateway could be an API client. But Reddit is being an asshole, so a gateway should probably be written as a scraper that accesses Reddit as if it were a normal user using a desktop Web browser.)
This is a great idea.
I don’t particularly think the whole of Reddit needs to be scraped though. I could be happy with only scraping posts that pass a certain thresh hold of votes against the subreddits subscriber count and maybe getting those crossposted to the Lemmy equivalent communities that want to opt in to such a service. This would be especially useful for World News and the more niche subreddits that don’t yet have a big enough userbase here
The hard part isn’t describing which posts or comments need to be gatewayed.
The hard part is being able to deliver posts and comments across the gateway at all.