Right now, every single comment and post vote is federated and is a single row in PostgreSQL tables.

Each person and community has a home instance. They only accurate counts of total post, comments, and votes is on that home instance. And even then it is theoretically possible that to save disk space and/or improve performance, the home of a community could purge older data (or have loss of data).

For lemmy to lemmy, I think instead of federating the votes independent of the posts and comments, there could be sharing of aggregate data directly.

The model for this is how profiles of a person are federated. When a person revises their profile on their home instance, every other instance has to get the updated change. Such as a new image or revised bio. Same with the profile of a community is revised, such as changing image or sidebar of a community.

The code redesign could start out by making person and community aggregate count sharing part of those revisions. Those numbers are not that time-sensitive, the statistics of the number of users, posts, comments in a community could be behind by many hours without any real impact on the end-user experience of reading posts and comments on Lemmy.

With votes, posts it is more tricky. But some experiments could be done such as only sending post aggregates when a comment on that post is created, deleted, or edited… and a more back-fill-oriented bulk operation take care of correcting and discovering out of sync information.

  • RoundSparrowOP
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Performance isn’t the only factor, http connection overhead to federate every single vote on post or comment… storage for them individually is higher than an aggregate per-instance.

    It could be an option to have hybrid approach since all the code is there now for PostgreSQL row for every vote… such as after 14 days (or whatever) aggregate, move data out of PostgreSQL for archive/etc.