Based on a lot of personal messaging app development and SQL database experience back in the low-performance hardware days…
I speculate that the developers of Lemmy have not populated the database with large amounts of test data. It shows a lot of symptoms of a codebase that never had a test-data-generation app that would create say 1000 test user accounts, 50 test communities, stuff 1000 random postings in each test community - and then put 10 to 150 random comments of various lengths on each of those postings.
It looks to me more like it started with an empty SQL database and has incrementally grown to a few thousand end-users creating a few posts a day with a relatively small number of comments on those postings.
Growing pains are normal, but it may help to have an instance out there for performance testing of the database queries and insert performance… and even invite end-users to hammer on the test instance all at a scheduled time (testing party) to find where the performance bottlenecks lie.
1000 test user accounts, 50 test communities, stuff 1000 random postings in each test community - and then put 10 to 150 random comments of various lengths on each of those postings.
For clarity, I mean 1000 random postings and 150 random comments per posting, PER USER.
1000 users * 50 communities * 1000 postings * 150 comments.
That level of testing should have been done 18 months ago in the project as a minimum, looking back on the code development. I’d say 1000 times more (plus images on at least half the postings and 1 in 8 comments), at least, in 2023. Especially since this kind of project attracts people running out of their own personal pocket, spending their own money, for server hardware. The no-advertising income, no monetization focus puts a higher concern on keeping the server efficient on requirements.
The code also needs to give early warnings, in the face of server operators on the admin web pages, of disk-full, performance nearing crash thresholds, etc. The site needs very basic warning signs, even if its bash scrips going into SQL for the moment, quick and dirty tools are fine.
These numbers tell the story, Lemmy.nl has over 40% more comment data than other servers:
The issues with TRIGGER statements on site_aggregates would have been difficult to simulate, requiring 1800 federated instances to match what Lemmy.ml likely has in the live database. But an example of how testing and studying execution time of SQL is important with quantities of data matching production.
Lemmy.ca found the issue by making a full copy of their database to test on locally to see the triggers were hitting so many rows… and that site goes back to 2020 data.