There are also issues lurking with accumulation of data. Moving to a batch processing system might want to consider that some instance operators may only wish to retain 60 days of fresh content vs. having every single history of content for a community for search engines and local-searching. The difference in performance is huge, which is why popular Lemmy servers have been crashing constantly - the amount of data in the tables entirely changes the performance characteristics.

Right now, Lemmy has no concept of tiered storage or absent content from replication or purge choices. Looking from the bottom-up, API client before touching PostgreSQL - a smart caching layer could even proxy to the API of a peer instance and offer a virtual copy (cached) of the data for a listing or post. Such a design could intelligently choose to do this for a small number of requests and avoid burdening PostgreSQL with the storage of a post from months or years ago that a few people take a recent interest in (or a search engine wants to pull a copy of old posts).

  • RoundSparrowOPM
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    On July 18, this comment was made on a pull-request:

    One major reason I can see for maintaining a split is storage tiers. post_aggregates can know of a post in a community without actually having to hold the content body in PostgreSQL. API to another Lemmy server could fetch the content or non-PostgreSQL.

    This is an area where Lemmy might be well place for scaling.