Hey everyone. If you want to post links or discuss the Reddit blackout, please localize it to this thread in order to keep things tidy!
Hey everyone. If you want to post links or discuss the Reddit blackout, please localize it to this thread in order to keep things tidy!
As an engineer, this sounds most plausible - they had proactive detection and resolution in place against various attacks and system failures, which got triggered due to the massive drop in public subreddits/users/activity, and made everything worse. Honestly, this isn’t a scenario their engineers could have easily predicted…
As a former sysadmin and a [still, for the moment] reddit moderator, my bet is that most of the subreddits that switched to private forgot to (or didn’t know to) go into “new reddit” and switch off the thing that allows people to request being added to the now-private subreddit.
A HUGE influx of people pounding on the “let me in, add me to the sub” button, which sends modmail, may have overloaded the whole modmail system, which in turn sometimes goes kaflooey for no apparent reason (my theory is: it gets bored).
I see this as a positive aspect of the protest.
I am also amused that random people are pounding on the door for access, as if they think approved submitters are having a private tea party inside.
I’ve had some of those. I’ve been responding with a link to Louis Rossman’s video and “Please consider limiting your own reddit use.”
Clearly you’re not someone who would have to go back and clear out 259238 modmail messages and make sure that none of them are legit “I have a problem” notes.
None of the subreddits I mod are that huge but just the thought of more than 100 at once makes me wanna cry.
At this point, they should just leave the 259,238 modmail messages for the admins to deal with. Let them sort through all that since this is all their doing.
Oh clearly I’m not. I just don’t understand the thinking of people demanding access. It’s like the kind of person who pounds on the door of a closed restaurant because they can see the employees inside.
People are selfish. People subconsciously think the rules apply to other people.
People who demand to come into closed stores and restaurants are not the exception. What’s even crazier is when you turn one away, anyone who has seen the door open even though the person was told no and didn’t get inside suddenly decides that maybe if THEY pound on the door, they’ll magically get access!
Oh man, my partner made a somewhat popular weapon calculator spreadsheet for Elden Ring, and the number of random Google Sheets edit requests they received was… quite a lot. (the instructions were right there for people to make a copy of the sheet to edit themselves! that’s how all of these sheets calculators work!) 🤦
Ah, but you see they “improved” modmail recently. It would certainly never go “kaflooey” anymore. It now fails all like “kerpow!” instead… much cooler, you see.
Well, of course, that’s just good engineering.
You see, kerpow!s scale much better than kaflooeys due to cache invalidation problems in the ooey inductors, that’s like first semester knowledge.
I’m just speculating of course, too, but could be some kind of sharding e.g. in the DB level. I can imagine the little subreddits draw little traffic hence fewer shards are allocated to them (like how S3 works).
I’m not sure if it’s just a load balancing issue. if all of Reddit can only access specific subs, maybe they split their servers that way
but I’m just guessing, because it doesn’t make much sense to go down, when there is less data to process…
in a way it does, when you’re building massive scale systems. Say you are the mitigation team and want to protect yourself against a malicious hacker/employee that starts shutting down web servers or removes posting permissions from the DB for everyone. You’re going to monitor the frequency of posts and if it drop too fast, you know something’s bad happening. You’re going to take automated measures against it - maybe freeze access to the DB completely, maybe switch to a (much less tested) backup region/system, etc… so you can see how things can snowball from there to strange scenarios…
yeah, well, maybe…
usually unexpected situations have unexpected errors. so yeah, you could be right
This makes a lot of sense to me (as an Operations Engineer).
I could imagine the architecture team has low watermark triggers to rescale the architecture, kill and restore hosts, or other changes based on expected user load. When that load just… isn’t there, the automated tooling just loops the same actions causing site instability.
I’ve had similar issues before, so it seems like a feasible explanation