• RoundSparrowOPM
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    2 years ago

    Some thoughts on what to consider… again, we have a Integer primary key for each post, 1160776 being the current one…

    I would have a dirty-status database table (or even something outside the database) for each posting. Maybe a timestamp of when the post was last write touched. Edit to the post, vote change on post, vote change on comments in that post, deleted or edited comment, etc. (Twitter and Reddit in recent years now keep and expose traffic accounting of every time a post is read… that can be a lot more writing of data, but community mods and end-users may want to know this information.)

    You would use this dirty-status information as the basis for when a cached result is rebuilt from the live database. I would perhaps think in terms of 10 or 30 seconds, if a intermediate-cached result hasn’t been generated in over 30 seconds, you would rebuild it (new comments or vote count changes likely being the reason you would need to rebuilt it). You probably would want to try to do this in a more linear batch style, as you have to consider if your API is being crawled or heavily hit and you have concurrent posts all triggering rebuilds in a short period of time.

    I understand that in an ideal world you let the DBMS do all this kind of work for you intelligently. And that’s the approach Lenny is currently built on, that the DBMS is smart enough to intelligently manage buffers and caches internally to deal with lots of repeat-output activity. But classically, the database systems are not that smart and not really that fast at shuffling lots of text data through database drivers. That is kind pretty much why NoSQL database became so popular… performance issues (and predictability under surging loads).