How much data does an Instance want to hold, and how much for specific communities

RoundSparrow · 1 year ago

How much data does an Instance want to hold, and how much for specific communities

RoundSparrow · edit-2 1 year ago

In the API middleware of remote-instance, I am assuming that could be implemented with some leverage of the existing design of Lemmy 0.18.2

Let’s assume some basic scaling enhancements are made to lemmy

community_aggregate keep track of timestamps of: post, post edit, comment, comment edit. Votes are more tricky, but that could be updated in batch too. Comment votes can probably be ignored in favor of post votes only. last post vote change anywhere in the community.
replication of aggregates for both person and community becomes a feature of Lemmy similar to how profile of community and person are replicated.

Ok, so a site could analyze that a community is well suited for remote stub instead of full copy.

Only reading would pull from the API. For authentication/API sake, adding creation/edit of local comments and posts would go through local process of a stub community’.
Existing federation could be enhanced: for incoming federation receive logic look if a stub-community and not actually INSERT new comments and posts into PostgreSQL and set a timestamp flag on community_aggregates that the cache is dirty. Alternate implementation could be receive as normal and to purge anything older than 24 hours once a day from stub communities.

RoundSparrow · 1 year ago

On July 18, this comment was made on a pull-request:

One major reason I can see for maintaining a split is storage tiers. post_aggregates can know of a post in a community without actually having to hold the content body in PostgreSQL. API to another Lemmy server could fetch the content or non-PostgreSQL.

This is an area where Lemmy might be well place for scaling.