Lemmy.world had to shut down the front page and put up a message about the load and a graph. They seem to chalk it down to the nature of social media sites to attract attacks.

I’d hack up the Rust code to have self-awareness of concurrency with PostgreSQL and return a new busy error.

Federation connections, RSS feed, API - and any other method that is hitting the database needs to have a concurrency count in the Rust code and an error message system for busy.

I’d probably build a a class to help with this and once concurrency for an API is over 5 mark the high water with a timestamp and start doing logic based on elapsed time. If > 5 and elapsed time exceeds a threshold (say 1 minute), then return the busy error.

is Prometheus the right way to expose these numbers for operators wanting to know about the thresholds.? I’d probably add a dedicated log file to track concurrency thresholds and busy errors.

the front-end apps also need to be caching “Trending communities”, I think lemmy-ui is still pulling that live from PostgreSQL for every refresh of the page. I need to check if anyone has added that.

  • RoundSparrowOP
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    Maybe I’m overthink the performance problems.

    Deleting accounts probably creates a swarm of activity like I opened a GitHUb issue, and it’s already been a source of problem triggering bugs. But even without bugs, it’s stil got to make the system run way slower. And there is nothing preventing someone from setting up a federation instance, creating a bunch of content, then deleting it - triggering multiple servers to overload.

    The variability of performance on reads could be directly tied to how much writes are gong on with account deletion.

    Even comment reply chains seem to be triggering (replaceable) performance concerns.