It was last Sunday, July 23 that the site_aggregates update of 1500 rows issue was discovered.

Lemmy.ca made a clone of their database.

The UPDATE of 1500 rows in site_aggregates is an example of what keeps slipping through the cracks.

lemmy.ml as a long-running server may have data patterns in it that a fresh-install server does not easily match.

Lemmy.ca cloning a live database to run EXPLAIN was exactly the kind of thing I wanted to see from lemmy.ml 2 months ago.

  • RoundSparrowOP
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    It is the data that the server holds that has been difficult to reproduce. There may even be some data in these long-running servers from older versions that testing on new code doesn’t generate in the same patterns.

    • RoundSparrowOP
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      The bot cleanup work has some interesting numbers regarding data n the database: https://sh.itjust.works/post/1823812

      lemmy.ml has a much wider range of dates on communities, post, comments, user accounts than what new testing would generate. Even if you install a test server with the same quantity of data, the date patterns would come out a lot different from the organically grown lemmy.ml

      All I know is lemmy.ml errors out every single day I do routine browsing, and I haven’t seen any website throw this many errors in many many years. Delete of Accounts could also possibly be causing these 2 to 4 minute periods of overload, even with the 0.18.3 fixes.

  • RoundSparrowOP
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    The live-server data not being reproducible on testing systems…

    I have avoided running performance tests on live servers with the API because I didn’t want to be contributing to server overloads that were already happening when I would just visit the normal interface reading content.

    Maybe it’s time to go back to that approach. And see if perhaps schedule jobs can be detected. Can I detect when a server has updated rank based on output of a post listing? Can I detect lag time when PostgreSQL is writing that batch data?

    Are PostgreSQL overloads happening at similar times on multiple severs when a ban-remove data or account delete replicates?

  • RoundSparrowOP
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    1 year ago

    !lemmydev@lemm.ee

    Unrelated, but I can never find the name of that community because it is “Lemmy App Dev” but the database key is “lemmydev”, which is what I remember.