• RoundSparrow
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    I’ve revisited this GItHub discussion now at least 5 times in the past 4 weeks.

    this needs major editing, but throwing it into raw journal:


    Time based queries. IE return all results within a week or a month, where page 1 is < 1 week, page 2 > 1 week and < 2 weeks, etc.

    I think time is really important to the whole application, because user activity is almost always centered around a small number of postings in the total database.

    For purposes of discussion, assume voting isn’t real time, because votes are a write activity that you may want to prioritize (meaning shed features under heavy load or minimize server resource usage if a person wants to operate that way).

    If you can really focus on user activity, comments and postings have pretty strong patterns of what is “current”. Most people don’t go past into posts that are more than a few days old. You have to worry about search engines and crawlers hitting all your old content and working the database, but really the times you need the performance most - you really are working with a small percentage of the total posts and comments in the system.

    Select the entire tables in code, sort by time, and throw out results above the LIMIT. Probably incredibly slow, and taxing to the DB.

    The comments table alone is too much data. But I do think you could select entire POSTINGS with all their comments, and cache them in an intermediate format so you do not have to go to SQL for every single page refresh. And some timer-based invalidation (say 1 minute) to rebuild that cached data.

    I remember when Reddit used to archive all posts after 6 months, they eventually got their performance so good on when to rebuild pages that they didn’t do that. But I think their approach was a good direction as an immediate development goal.