Federation likes (votes) are wildly different from server to server as it stands now. And unless something is way off on my particular server, 0.4 seconds is what PostgreSQL is reporting as the mean (average) time per single comment vote INSERT, and post vote INSERT is similar. (NOTE: my server is classic hard drives, 100MB/sec bencharked, not a SSD)
Discussion of the SQL statement for a single comment vote insert: https://lemmy.ml/post/1446775
Every single VOTE is both a HTTP transaction from the remote server and a SQL transaction. I am looking into Postgress supporting batches of inserts to not check all the index constraints at each single insert: https://www.postgresql.org/docs/current/sql-set-constraints.html
Can the Rust code for inserts from federation be reasonably modified to BEGIN TRANSACTION only every 10th comment_like INSERT and then do a COMMIT of all of them at one time? and possibly a timer that if say 15 seconds passes with no new like entries from remote servers, do a COMMIT to flush based a timeout.
Storage I/O writing for votes alone is pretty large…
Completely off topic. You’ve linked to another post. I follow it, and end up on a different server, where I don’t have an account. I wonder if there is a possible solution.
There is an open GitHub issue on that topic, and /c/lemmyfederaiton or /c/lemmycode community for discussion.
For anyone else looking, here is the issue: https://github.com/LemmyNet/lemmy/issues/3259
Thanks, I didn’t have the link handy.
Batching the inserts up only kicks the can down the road a few weeks. We need a 500x improvement in insertion time.
The proposal has been raised (by me) to move all federation out of lemmy_server into a different service and have a queue in there. I think that opens up to people working and updating the code better. The email systems I have worked with that have a database storage backend have used their own MTA service, not run in the main app’s core. I also think Reddit does data acceptance before it gets to PostgreSQL too - as I’ve seen comments get backed up when one of their servers or services goes offline.
it is already in a different repo, just running from the same process. since 0.18 (with debug mode off) it should also be running in a somewhat efficient multi-thread environment
Or, build some sort of in memory buffer that bulks the actions, and commits to the DB in batches over time/size a-la AWS Kinesis Firehose.
Every action gets recorded in memory, in their own separate buffers (I.E. one for post, one for comment, one for like/dislike, etc.) and is supplemented to the results on each request. When the buffer reaches X MB in size, it is then batch insert/update’ed into the database as to reduce the overhead?
Well, I’m pulling out “like/dislike” (votes), because I consider it less of a priority. The actual comments and postings are taking over 1.0 second to INSERT, but that’s the bulk of the site’s purpose - to share actual content. If the likes lag by 15 seconds, is that such a big deal?
The way I’d imagine this working is the data is supplemented (augmented) into the query results.
Basically, there’d be a new cache / injection layer that sits between the application and the database. Instead of application directly working with the existing ORM to work with the DB (I’m not actually sure how Rust does this, so I’m just speaking in broader terms), the application would work against this layer that creates the buffer, and interface with the ORM. Then, on write actions, it fills up the buffer until buffer fills up or some time has past before bulk performing the write action; on read actions, it interfaces with ORM, and then weave the buffered data into the response back to the application.
Thus, from the user’s perspective, nothing should be changed, and they wouldn’t know any wiser.
Someone else commented about the interactive website being slow with Votes… https://lemmy.ml/comment/890862
well tuned a single insert should take less than a tenth of a millisecond on postgresql. if you’re seeing slow times it might be due to very different things, especially the huge locking hot_rank updates that make all inserts / updates on comments,posts table pause until done (which will show up as slower times in the query times you’re looking at)
I mixed up the units, but it is taking .4 milliseconds typically. I really want to get some of these stats out of the major servers (Beehaw, Lemmy.world, Lemmy.ml)
yes, real stats would be real helpful. what about those user queries you said take 10 seconds? is that still true? maybe you could publish a new overview over what you find?
what about those user queries you said take 10 seconds? is that still true?
I made a major mistake (discovered yesterday), it’s been over a decade since I ran a major Postgres website. The units were fractions of a millisecond, not seconds. So that was 10 milliseconds.
At this point I don’t have reason to hide the data, here is my two significant server stats:
https://lemmyadmin.bulletintree.com/query/pgcounts?output=table Record counts in tables https://lemmyadmin.bulletintree.com/query/pgstatements?output=table pg_stat_statement extension output, curated columns. Full columns is available if you need it: https://lemmyadmin.bulletintree.com/query/pgstatements1?output=tableFeel free to refresh these pages as much as you like. The pg_stat_statement can be reset, I think I reset them 24 hours ago.
Server is 4 ARM cores, 24 GB of RAM, 200GB IDE level storage performance (Oracle Cloud). There are no interactive users other than myself and perhaps some Google traffic. Federation is the main activity, and I’ve been subscribing to as many big communities as I can for weeks.
Damn it - I made a mistake
Ok, re-reading the documentation again, I made a major error interpreting these results.
mean_exec_time double precision: Mean time spent executing the statement, in milliseconds
All my statements about INSERT on Likes taking 1/3 of a second are wrong, it’s less than 1 millisecond. Although it sure doesn’t feel that fast when you are interactively using Lemmy 0.18 and pressing the Like button, it seems rather sluggish. I’ve almost never seen fractions of a milliseconds, but here it is.