Lemmy’s PostgreSQL was developed with this philosophy, intentional or otherwise:
-
Real-time client connection and notification via websocket that gets every single action that passes through PostgreSQL. One upvote, instantly sent to client. One new comment, instantly appeared on Lemmy-ui with version 0.17.4
-
INSERT overhead for PostgreSQL. As soon as a Lemmy post or comment is done, a parallel post_aggregate and comment_aggregate row is created.
-
INSERT counting overhead. Extreme effort is made by Lemmy to count things, all the time. Every new INSERT of a comment or post does a real-time update of the total server count. This is done via a SQL UPDATE and not by just issuing a COUNT(*) on the rows when the data is requested.
-
No archiving or secondary storage concept. PostgreSQL has it in the main tables or nothing.
-
Raw numbers, local and unique for each instance, for comment and post. But community name and username tend to be more known than raw numbers.
-
Sorting choices presented on many things: communities, posts, comments. And new methods of sorting and slicing the data keep being added in 2023.
-
No caching of data. The developers of lemmy have gone to extremes to avoid caching on either lemmy-ui or within the Rust code of lemmy_server. Lemmy philosophy favors putting constant connection to a single PostgreSQL.
-
User preferences and customization are offloaded to PostgreSQL do do the heavy lifting. PostgreSQL has to look at the activity of each user to know if they have saved a post, previously read a post, upvoted that post, or even blocked the person who created the post.
-
Language choice is built into the system early, but I see no evidence it has proven to be useful. But it carries a high overhead in how many PostgreSQL database rows each site carries - and is used in filtering More often than not, I’ve found end-users confused why they can’t find content when they accidentally turned off choices in lemmy-ui
-
All fields on SELECT. Throughout the Rust Diesel ORM code, it’s every field in every table being touched.
-
SELECT statements are almost always ORM machine generated. TRIGGER FUNCTION logic is hand-written.
Rust function:
// Runs the hot rank update query in batches until all rows have been processed. /// In `where_clause` and `set_clause`, "a" will refer to the current aggregates table. /// Locked rows are skipped in order to prevent deadlocks (they will likely get updated on the next /// run) fn process_hot_ranks_in_batches( conn: &mut PgConnection, table_name: &str, where_clause: &str, set_clause: &str, ) { let process_start_time = NaiveDateTime::from_timestamp_opt(0, 0).expect("0 timestamp creation"); let update_batch_size = 1000; // Bigger batches than this tend to cause seq scans let mut processed_rows_count = 0; let mut previous_batch_result = Some(process_start_time); while let Some(previous_batch_last_published) = previous_batch_result { // Raw `sql_query` is used as a performance optimization - Diesel does not support doing this // in a single query (neither as a CTE, nor using a subquery) let result = sql_query(format!( r#"WITH batch AS (SELECT a.id FROM {aggregates_table} a WHERE a.published > $1 AND ({where_clause}) ORDER BY a.published LIMIT $2 FOR UPDATE SKIP LOCKED) UPDATE {aggregates_table} a {set_clause} FROM batch WHERE a.id = batch.id RETURNING a.published; "#, aggregates_table = table_name, set_clause = set_clause, where_clause = where_clause )) .bind::(previous_batch_last_published) .bind::(update_batch_size) .get_results::(conn); match result { Ok(updated_rows) => { processed_rows_count += updated_rows.len(); previous_batch_result = updated_rows.last().map(|row| row.published); } Err(e) => { error!("Failed to update {} hot_ranks: {}", table_name, e); break; } } } info!( "Finished process_hot_ranks_in_batches execution for {} (processed {} rows)", table_name, processed_rows_count ); }