cross-posted from: https://lemmy.cloudhub.social/post/14149

What’s everyone using for status monitoring and/or status pages either in their lab or at work?

I setup a status page for my fediverse instances using Uptime Robot (have an existing subscription), and the features are kinda lacking. I feel like they haven’t really updated anything in the last 5 years which is unfortunate.

  • redcalcium@c.calciumlabs.com
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    1 year ago

    I run Vigil in a separate small VM. Vigil’s features really suit my needs:

    • when a service is down, first it’ll notify you via email (I use mailgun). If the service is still down for an extended period, it’ll start texting you (via twilio).
    • it has the usual ping/http check to see if your services are up
    • it can even monitor services that’s not reachable from the vigil instance (e.g. services that only accessible from local network) by using Agents. In addition to ping/http check, the agent can also run arbitrary commands. It basically can be used to monitor uptime of anything this way

    The drawback of running your own monitoring service is the monitoring service itself can be down. Happened several times to me, and each time I was spammed with DEAD email alert, which immediately followed with HEALTHY alert email.

    Edit: now that I think about it, I’ll probably need to add my monitoring service into a monitoring service to monitor the monitoring service’s uptime.