Speculation about Lemmy development over the past couple years - and the database content

RoundSparrow · edit-2 2 years ago

Speculation about Lemmy development over the past couple years - and the database content

RoundSparrow · edit-2 2 years ago

1000 test user accounts, 50 test communities, stuff 1000 random postings in each test community - and then put 10 to 150 random comments of various lengths on each of those postings.

For clarity, I mean 1000 random postings and 150 random comments per posting, PER USER.

1000 users * 50 communities * 1000 postings * 150 comments.

That level of testing should have been done 18 months ago in the project as a minimum, looking back on the code development. I’d say 1000 times more (plus images on at least half the postings and 1 in 8 comments), at least, in 2023. Especially since this kind of project attracts people running out of their own personal pocket, spending their own money, for server hardware. The no-advertising income, no monetization focus puts a higher concern on keeping the server efficient on requirements.

The code also needs to give early warnings, in the face of server operators on the admin web pages, of disk-full, performance nearing crash thresholds, etc. The site needs very basic warning signs, even if its bash scrips going into SQL for the moment, quick and dirty tools are fine.

RoundSparrow · 2 years ago

These numbers tell the story, Lemmy.nl has over 40% more comment data than other servers:

RoundSparrow · 1 year ago

The issues with TRIGGER statements on site_aggregates would have been difficult to simulate, requiring 1800 federated instances to match what Lemmy.ml likely has in the live database. But an example of how testing and studying execution time of SQL is important with quantities of data matching production.

Lemmy.ca found the issue by making a full copy of their database to test on locally to see the triggers were hitting so many rows… and that site goes back to 2020 data.