It was last Sunday, July 23 that the site_aggregates update of 1500 rows issue was discovered.
Lemmy.ca made a clone of their database.
The UPDATE of 1500 rows in site_aggregates is an example of what keeps slipping through the cracks.
lemmy.ml as a long-running server may have data patterns in it that a fresh-install server does not easily match.
Lemmy.ca cloning a live database to run EXPLAIN was exactly the kind of thing I wanted to see from lemmy.ml 2 months ago.
The live-server data not being reproducible on testing systems…
I have avoided running performance tests on live servers with the API because I didn’t want to be contributing to server overloads that were already happening when I would just visit the normal interface reading content.
Maybe it’s time to go back to that approach. And see if perhaps schedule jobs can be detected. Can I detect when a server has updated rank based on output of a post listing? Can I detect lag time when PostgreSQL is writing that batch data?
Are PostgreSQL overloads happening at similar times on multiple severs when a ban-remove data or account delete replicates?