A major problem with a data-centered application like Lemmy is that you can run into cache misses pretty easily.

Denial-of-Service by Requesting Old Postings

It’s a denial-of-service possibility that someone can do a web crawler/API crawler on your postings… the server fetching posts and comments from 3 months ago. Intentional or otherwise, busying up your server and putting old content into the database cache.

On a Lemmy/Reddit style media forum, the changes to the data tend to slow way down after the first 48 hours of a posting when there is a stream of new postings by end-users. This favors having some concept of inactive vs. active posts and how you choose to cache pre-generated output vs. live database queries of real-time data.

For the inactive postings, you may want to have a caching that’s entirely outside the database and favors avoiding the database. Such as limiting the sort options to end-users on inactive posts and storing pre-fetched content directly on server disk files that can be loaded instead of assembling from the live database each API/app query.

Major Social Events

As a server operator, you might desire fallback behavior during surges. September 11, 2001 was an example where major social media websites and even TV news websites all fell over due to to everyone flocking to get news and make comments about the event.

Do you want your social media system to show ‘internal server error’, or more gracefully go into a read-only mode. In a read-only mode you could probably still process logins and rendering the existing postings and comments, but not allow voting and comments to flood into the database.

Partly the assumption is that a community like Lemmy doesn’t have a huge budget and the ability to spin up dozens of new servers like Twitter Reddit, or Facebook does… so building in ‘fallback’ behavior switches to the code for the owner/operators might be more important than the big well-funded players.

NOTE: I have autism and often struggle with my language, this needs some editing, please excuse if I’m being confusing or not providing clear enough examples. This is kind of a starting point on expressing an idea vs. a well-written posting.

  • RoundSparrowOPM
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    When I talk about having intermediate caching of the comment data, such as disk file or NoSQL database… there is an implied assumption that people reading the website are far more common than people commenting or posting. Voting can be tricky, recording the votes, and I suggest that votes be queued to an app that updates the database as opposed to live-updates on every vote by end users.

    Another assumption is that scaling the app could have multiple servers running the Lemmy API talking to a single (network local) PostgreSQL instance. I currently have no idea of the history or practicality of this in the current code base. But generally you strive for the DBMS to be the one to deal with the data in a consistent transactional way - and rendering caching be done inside the (Lemmy) API application.

    All this discussion assumes you are rolling your own custom programming to do this, not using some kind of intermediate off-the-shelf caching layer.