We have a machine running some stuff on Docker, and little by little it has started to become important to keep an eye on it. However, looking for information on monitoring a Docker server it always seem to assume you’re running it in Swarm mode, which is not and WILL NOT be the case of this machine, Swarm adds a layer of complexity unneeded in this case.

What do you recommend for this case? I for one would love if the thing didn’t just give you a view of the things running on it but also gave you notifications if something went wrong (like if a container had to be restarted, or if one suddenly started eating all the CPU or something unusual).

  • @wjs018
    link
    36 months ago

    I will be keeping an eye on this thread to see what other people do, but what I have done in the past is to have a couple different health checking strategies.

    • For web-accessible services I am running, I usually run something like Uptime Kuma or Gatus on a different box checking to make sure those web endpoints are available and performant. I lately have been really digging how Gatus can check more than just the response header, but also latency and certificate validity.
    • For the host machine, you can set up custom alerts within netdata for stuff like cpu utilization and memory with custom thresholds. The only other solution I have used for this in the past is setting up alerts through my VPS provider (if it is a VPS that is).
      • On really low-spec machines I have had trouble with netdata though, so I don’t have a good solution in those cases. Interested to see if there are less demanding options. Instead, I have resorted to just using dashdot as a PWA so that I can check it easily on my phone if I am on the go.
    • For some custom services in the past that run on set schedules, I have used healthchecks.io (which you can selfhost) to send alerts in the case that they don’t run for some reason.
    • As for the containers being restarted, I actually don’t have experience with that, so I am interested to see what others have done.
    • Lupec
      link
      fedilink
      36 months ago

      Gatus sounds pretty cool, I’ll definitely give it a closer look later. Maybe it’s the push I needed to go ahead and look into proper observability as a whole, log ingestion and whatnot. My homelab setup is sorely lacking on that department if I’m being honest lol