When I see SQL select statements like this: https://lemmy.ml/comment/793612

I start to think about caching.

  1. Postings with 200 comments or less I would maybe treat entirely different code path.

  2. Postings that are lively and being updated, different code path. If its a fresh-meat topic, with comments going crazy every minute, code may want to consider that.

  3. Over 200 comments, maybe just dump the whole damn comments for that posting outside of PostgreSQL into nosql or some cache system, then start processing them independently.

  4. Some kind of site self-awareness that the serverr wants to run on a resource budget or is under some major social event load (such as an airplane crash getting huge influx of users for discussion)… and you are willing to shed timely comment publishing to keep your database activity sane. You might be willing to not read new (and edited/deleted) comments from that posting for 2 minutes, just to offload your database.

  5. That there may be several front-end “rust code” servers of lemmy_server talking to a single PostgreSQL backend here. And that the caching in #3 above on this list would be duplicated on those mid-tier servers.

Kind of rambling, but it’s on my mind.

  • RoundSparrowOPM
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    I still haven’t wrapped my head around what Lemmy is doing with comment ranking, sorting, to count up votes. It’s a lot to get into the brain.

  • RoundSparrowOPM
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    This is me re-processing the same general concepts I was brainstorming 17 days ago on this posting: https://lemmy.ml/post/1160776

    It’s hard to know where to begin on this. Reddit code was open source 6 years ago, and uses PostgreSQL, I’d really like to hear if someone knows how they deal with scaling (at least 6 years ago).