Hi all. I’ve been having some problems keeping fedia.io running - at the moment, either the message workers or the php web server processes are dying after an hour or so and I have to restart everything. I have been working with the mbin team and installed some updates that we hoped would fix the problems, but no luck. I am going to work on a cron job to automatically restart things once an hour. The down side, is that you’ll likely see some error 500’s if you happen to hit it when the processes are restarting, but it should happen quickly and refreshing the page should make it work again.
ok - I took a bit different approach. Since I know what error in rabbitmq’s log file is associated with things coming to a stop on fedia.io, I installed swatchdog and set it up to look for that word (which is, btw, “timeout”). I created a script that stops all the messengers, then stops php-fpm, keydb, and rabbitmq. Then it start rabbit, keydb, and php-fpm in order. Finally, it restarts the messengers.
I will be surprised if it works first time, so it may still crash again but I’ll be watching