Lemmy.world shared a slow query, in detail! Yey!

SELECT post . id, post . name, post . url, post . body, post . creator_id, post . community_id, post . removed, post . locked, post . published, post . updated, post . deleted, post . nsfw, post . embed_title, post . embed_description, post . thumbnail_url, post . ap_id, post . local, post . embed_video_url, post . language_id, post . featured_community, post . featured_local, person . id, person . name, person . display_name, person . avatar, person . banned, person . published, person . updated, person . actor_id, person . bio, person . local, person . private_key, person . public_key, person . last_refreshed_at, person . banner, person . deleted, person . inbox_url, person . shared_inbox_url, person . matrix_user_id, person . admin, person . bot_account, person . ban_expires, person . instance_id, community . id, community . name, community . title, community . description, community . removed, community . published, community . updated, community . deleted, community . nsfw, community . actor_id, community . local, community . private_key, community . public_key, community . last_refreshed_at, community . icon, community . banner, community . followers_url, community . inbox_url, community . shared_inbox_url, community . hidden, community . posting_restricted_to_mods, community . instance_id, community . moderators_url, community . featured_url, community_person_ban . id, community_person_ban . community_id, community_person_ban . person_id, community_person_ban . published, community_person_ban . expires, post_aggregates . id, post_aggregates . post_id, post_aggregates . comments, post_aggregates . score, post_aggregates . upvotes, post_aggregates . downvotes, post_aggregates . published, post_aggregates . newest_comment_time_necro, post_aggregates . newest_comment_time, post_aggregates . featured_community, post_aggregates . featured_local, post_aggregates . hot_rank, post_aggregates . hot_rank_active, post_aggregates . community_id, post_aggregates . creator_id, community_follower . id, community_follower . community_id, community_follower . person_id, community_follower . published, community_follower . pending, post_saved . id, post_saved . post_id, post_saved . person_id, post_saved . published, post_read . id, post_read . post_id, post_read . person_id, post_read . published, person_block . id, person_block . person_id, person_block . target_id, person_block . published, post_like . score, coalesce ( ( post_aggregates . comments - person_post_aggregates . read_comments ), post_aggregates . comments ) FROM ( ( ( ( ( ( ( ( ( ( ( ( post_aggregates INNER JOIN person ON ( post_aggregates . creator_id = ? . id ) ) INNER JOIN post ON ( post_aggregates . post_id = ? . id ) ) INNER JOIN community ON ( post_aggregates . community_id = ? . id ) ) LEFT OUTER JOIN community_person_ban ON ( ( post_aggregates . community_id = ? . community_id ) AND ( community_person_ban . person_id = ? . creator_id ) ) ) LEFT OUTER JOIN community_follower ON ( ( post_aggregates . community_id = ? . community_id ) AND ( community_follower . person_id = ? ) ) ) LEFT OUTER JOIN post_saved ON ( ( post_aggregates . post_id = ? . post_id ) AND ( post_saved . person_id = ? ) ) ) LEFT OUTER JOIN post_read ON ( ( post_aggregates . post_id = ? . post_id ) AND ( post_read . person_id = ? ) ) ) LEFT OUTER JOIN person_block ON ( ( post_aggregates . creator_id = ? . target_id ) AND ( person_block . person_id = ? ) ) ) LEFT OUTER JOIN community_block ON ( ( post_aggregates . community_id = ? . community_id ) AND ( community_block . person_id = ? ) ) ) LEFT OUTER JOIN post_like ON ( ( post_aggregates . post_id = ? . post_id ) AND ( post_like . person_id = ? ) ) ) LEFT OUTER JOIN person_post_aggregates ON ( ( post_aggregates . post_id = ? . post_id ) AND ( person_post_aggregates . person_id = ? ) ) ) LEFT OUTER JOIN local_user_language ON ( ( post . language_id = ? . language_id ) AND ( local_user_language . local_user_id = ? ) ) ) WHERE ( ( ( ( ( ( ( ( community . removed = ? ) AND ( community . deleted = ? ) ) AND ( post . removed = ? ) ) AND ( post . deleted = ? ) ) AND ( community . local = ? ) ) AND ( ( community . hidden = ? ) OR ( community_follower . person_id = ? ) ) ) AND ( post . nsfw = ? ) ) AND ( community . nsfw = ? ) ) ORDER BY post_aggregates . featured_local DESC, post_aggregates . hot_rank DESC, post_aggregates . published DESC LIMIT ? OFFSET ?

  • RoundSparrow @ BT@bulletintree.comOPM
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago
    SELECT post . id, post . name, post . url, post . body, post . creator_id, post . community_id, post . removed, post . locked, post . published, post . updated, post . deleted, post . nsfw, post . embed_title, post . embed_description, post . thumbnail_url, post . ap_id, post . local, post . embed_video_url, post . language_id, post . featured_community, post . featured_local, person . id, person . name, person . display_name, person . avatar, person . banned, person . published, person . updated, person . actor_id, person . bio, person . local, person . private_key, person . public_key, person . last_refreshed_at, person . banner, person . deleted, person . inbox_url, person . shared_inbox_url, person . matrix_user_id, person . admin, person . bot_account, person . ban_expires, person . instance_id, community . id, community . name, community . title, community . description, community . removed, community . published, community . updated, community . deleted, community . nsfw, community . actor_id, community . local, community . private_key, community . public_key, community . last_refreshed_at, community . icon, community . banner, community . followers_url, community . inbox_url, community . shared_inbox_url, community . hidden, community . posting_restricted_to_mods, community . instance_id, community . moderators_url, community . featured_url, community_person_ban . id, community_person_ban . community_id, community_person_ban . person_id, community_person_ban . published, community_person_ban . expires, post_aggregates . id, post_aggregates . post_id, post_aggregates . comments, post_aggregates . score, post_aggregates . upvotes, post_aggregates . downvotes, post_aggregates . published, post_aggregates . newest_comment_time_necro, post_aggregates . newest_comment_time, post_aggregates . featured_community, post_aggregates . featured_local, post_aggregates . hot_rank, post_aggregates . hot_rank_active, post_aggregates . community_id, post_aggregates . creator_id, community_follower . id, community_follower . community_id, community_follower . person_id, community_follower . published, community_follower . pending, post_saved . id, post_saved . post_id, post_saved . person_id, post_saved . published, post_read . id, post_read . post_id, post_read . person_id, post_read . published, person_block . id, person_block . person_id, person_block . target_id, person_block . published, post_like . score, coalesce ( ( post_aggregates . comments - person_post_aggregates . read_comments ), post_aggregates . comments ) 
    FROM ( ( ( ( ( ( ( ( ( ( ( ( post_aggregates 
        INNER JOIN person ON ( post_aggregates . creator_id = ? . id ) ) 
        INNER JOIN post ON ( post_aggregates . post_id = ? . id ) ) 
        INNER JOIN community ON ( post_aggregates . community_id = ? . id ) ) 
        LEFT OUTER JOIN community_person_ban ON ( ( post_aggregates . community_id = ? . community_id ) AND ( community_person_ban . person_id = ? . creator_id ) ) ) 
        LEFT OUTER JOIN community_follower ON ( ( post_aggregates . community_id = ? . community_id ) AND ( community_follower . person_id = ? ) ) ) 
        LEFT OUTER JOIN post_saved ON ( ( post_aggregates . post_id = ? . post_id ) AND ( post_saved . person_id = ? ) ) ) 
        LEFT OUTER JOIN post_read ON ( ( post_aggregates . post_id = ? . post_id ) AND ( post_read . person_id = ? ) ) ) 
        LEFT OUTER JOIN person_block ON ( ( post_aggregates . creator_id = ? . target_id ) AND ( person_block . person_id = ? ) ) ) 
        LEFT OUTER JOIN community_block ON ( ( post_aggregates . community_id = ? . community_id ) AND ( community_block . person_id = ? ) ) ) 
        LEFT OUTER JOIN post_like ON ( ( post_aggregates . post_id = ? . post_id ) AND ( post_like . person_id = ? ) ) ) 
        LEFT OUTER JOIN person_post_aggregates ON ( ( post_aggregates . post_id = ? . post_id ) AND ( person_post_aggregates . person_id = ? ) ) ) 
        LEFT OUTER JOIN local_user_language ON ( ( post . language_id = ? . language_id ) AND ( local_user_language . local_user_id = ? ) ) ) 
    WHERE ( ( ( ( ( ( ( ( community . removed = ? ) AND ( community . deleted = ? ) ) AND ( post . removed = ? ) ) AND ( post . deleted = ? ) ) AND ( community . local = ? ) ) AND ( ( community . hidden = ? ) OR ( community_follower . person_id = ? ) ) ) AND ( post . nsfw = ? ) ) AND ( community . nsfw = ? ) ) 
    ORDER BY post_aggregates . featured_local DESC, post_aggregates . hot_rank DESC, post_aggregates . published DESC 
    LIMIT ? OFFSET ?
    
  • RoundSparrow @ BT@bulletintree.comOPM
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    1 year ago

    Oblivious unpleasant answer to scaling

    I’m surprised lemmy.world hasn’t brought up one of the issues…

    Maybe they just keep overlooking. I don’t think it is the number of users that is as high as the amount of data.

    Archiving off PostgreSQL older data could help. Lemmy was designed with live-websockets and instant notification of every new comment… which worked fine with very little data in PostgreSQL.

    Lemmy.world database keeps growing. A quick-fix is to devise a way to move votes/posts/comments out of the tables say 45 days older. Even archive out person records once those posts associated with that person are moved out.

    The “dirty little secret” is that smaller instances people started 30 days ago run faster because THEY HAVE NO DATA in PostgreSQL!

    I’d rather we get specific AUTO_EXPLAIN details out of Lemmy.world, but it is maybe EASIER than it is getting people to fix Rust SQL ORM statements.

  • RoundSparrow @ BT@bulletintree.comOPM
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    https://lemmy.ml/post/3024393

    People seem to have been given the impression that a full Lemmy instance is kind of like a “super client” to read content from.

    What is really needed is an API caching layer… off-loading from lemmy.world and other overloaded servers would be best if API clients didn’t download the same content over and over.

  • RoundSparrow
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    I don’t mention it as often in public, but lemmy.world is unable to keep up with database size. Errors are common.

  • RoundSparrow @ BT@bulletintree.comOPM
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    1 year ago

    Also, a person can block a community and also subscribe to a community, and the query looks at these independent. ToDo: add testing code to confirm API behavior when a user blocks a community they are subscribed to and vise-versa.

    A denormalized table for person community_id to match multiple states could be of benefit. Especially in edge cases where a particular login account has a lot of blocks or subscribes to filter out posts and comments from.

  • RoundSparrow @ BT@bulletintree.comOPM
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    API returning deleted comments

    For anonymous users, has lemmy_server main branch diverged in behavior from 0.18 branch? Need to review testing scripts and run against both.

  • RoundSparrow @ BT@bulletintree.comOPM
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    I don’t like the JOIN approach lemmy uses. They may as well denormalize it all into one table with 200 fields for a post

    Back when I did SQL with DB2 and PostgreSQL and hardware was far more limited… I always thought defensively.

    If you have 11,000 posts in a community and you are requesting New/Hot/Active - do an IN clause that whacks them off at the pass.

    especially since most people are focused on hitting page 1 and maybe page 2 and 3. You gotta slice that data down. The problem exists on New, so there is no aging out with Hot/Active.

    Open-ended queries with LIMIT and JOIN can tangle up. Especially since post-aggregates id isn’t post_id?

    • RoundSparrow @ BT@bulletintree.comOPM
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      What exactly is the relationship of post_aggregates and post?

      aggregates id: 503771 to post id 2133833. Both tables have 502204 records in them, equal. But why is post ID incrementing so far ahead? Is federation pounding on the index with failed INSERT statements?

      • RoundSparrow @ BT@bulletintree.comOPM
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        CouldntCreatePost can be returned even on UPDATE of a created post. Is this wrapped in transaction?

        What exactly is going on with federated code and CouldntCreatePost?