I woke up this morning to a text from my ISP, “There is an outage in your area, we are working to resolve the issue”
I laugh, this is what I live for! Almost all of my services are self hosted, I’m barely going to notice the difference!
Wrong.
When the internet went out, the power also went out for a few seconds. Four small computers host all of my services. Of those, one shutdown, and three rebooted. Of the three that ugly rebooted some services came back online, some didn’t.
30 minutes later, ISP sends out the text that service is back online.
2 hours later I’m still finding down services on my network.
Moral of the story: A UPS has moved to the top of the shopping list! Any suggestions??
When you are bored, backup a VM then hard kill it and see if it manage to restart properly.
Software should be able to recover from that.
If it doesn’t, troubleshoot.
That reminds me of Netflix’s Chaos Monkey (basically in office hours this tool will randomly kill stuff).
When I built my home server this is what I did with all VMs. Learned how to change the start up delay time in esxi and ensured everything came back online with no issues from a cold built.
Rip VMware.
While I appreciate the sentiment, most traditional VMs do not like to have their power killed (especially non-journaling file systems).
Even crash consistent applications can be impacted if the underlying host fs is affected by power loss.
I do think that backup are a valid suggestion here, provided that the backup is an interrupted by a power surge or loss.
Edit: even journaling file systems aren’t a magic bullet. I’ve had an ext4 fs get corrupted when IO was interrupted by power loss. I get the down votes for mentioning non-journaling FS, but seriously folks, use the swiss cheese method of protecting your stuff… backups, redundant power/UPS, documented/automated installation/configuration.
Why are you using a non-journaling file system in 2024 when those were common 10+ years ago?
It’s been a while since a power cut affected my services, is this why?
I remember having to troubleshoot mysql corruption following abrupt power loss, is this no longer a thing?
Databases shouldn’t even need a journaling filesystem, they usually pay attention to when to use fsync and fdatasync.
In fact journaling filesystems basically use the same mechanisms as databases only for filesystem metadata.
Or even better use something like ZFS with CoW that can’t corrupt on power loss
and don’t fuck with sync writes
I would still consider that generation of filesystem to be effort to use while regular journaling filesystems have been so ubiquitous that you need to invest effort to avoid using one.
It was supported and the default out of the box when I installed my OS
Maybe on some distros that is the case if you install a recent version but to get a non-journaling filesystem you literally have to partition manually to avoid using one on any distro that is still supported today and meant for full sized PCs (as opposed to embedded devices).
Are you talking about Linux distros? What manual partitioning has to occur?
If you want to use a filesystem that is so bad that it doesn’t even have journaling you need to manually select it. None of them have been using one of those by default for 10-15 years now.
Your system should be fine after a hard kill. If its not stop using it as that’s going to be a problem down the road.