Trying to understand the different selfhosted monitoring solutions

dr_robot@kbin.social · 1 year ago

Trying to understand the different selfhosted monitoring solutions

roofuskit@kbin.social · 1 year ago

Netdata is great and easily deployed via docker. I ran it bare metal before and was also pleased if that’s your preference.

Encrypt-Keeper@lemmy.world · 1 year ago

Netdata when it works is pretty great, however it tends to eat up the RAM of whatever I put it on until the whole server stops responding. If they fixed whatever caused… that. I would totally still be using it.

ninchuka@lemmy.one · 1 year ago

I’ve never had that with the few systems I’ve ran it on

roofuskit@kbin.social · 1 year ago

I’ve also never had that issue. It’s had quite a few updates since I started using it.

knaak@lemmy.world · 1 year ago

I have been using Uptime Kuma for internal monitoring and Uptime Robot for external.

I like the combination and it seems like what you are looking for.

https://github.com/louislam/uptime-kuma https://uptimerobot.com

Rockslide0482@discuss.tchncs.de · 1 year ago

Seconded for simplicity. If OP is looking for complex statistics, it may not do the trick, but it’s about as straightforward and quick to set up as a monitoring solution can get.

vegetaaaaaaa@lemmy.world · edit-2 1 year ago

I am more interested in being able to observe metrics for each node individually rather than in aggregate.

This requirement makes me think netdata would be a good solution. In my current setup, each host has its own netdata dashboard and manages its own health checks/alarms. I have also enabled streaming which sends metrics from all hosts to a “parent/master” netdata instance from which I can see all metrics from all hosts without checking each dashboard individually.

However, it looks like it does not store the metrics for very long.

I still have to look into this, in the past it was certainly true and you had to setup a prometheus instance to store (and downsample, who needs few-seconds resolution for one year old metrics) metrics for long-term archival - but looking at the documentation right now, it looks possible to store long-term metrics in the netdata DB itself, by moving old metrics to a lower-definition storage tier: https://learn.netdata.cloud/docs/configuring/optimizing-metrics-database/change-how-long-netdata-stores-metrics

An important additional advantage is that it comes packaged on Debian (all my machines run Debian).

Same. However I install and update it from their third-party APT repository - it’s one of the rare cases where I prefer upstream releases to Debian stable packages, the last few upstream releases have been really nice (for example I’m not sure the new tiered retention system is availabel in the v1.37.1 Debian stable package)

My automated installation procedure (ansible role) is here if you’re interested (start at tasks/main.yml and follow the import_tasks).

dr_robot@kbin.social · 1 year ago

Thanks a lot for these tips! Especially about using the upstream deb.

Encrypt-Keeper@lemmy.world · 1 year ago

I’m a fan of Zabbix. I’ve used it in a datacenter environment but it’s much easier to configure than Icinga/Nagios and not as hackey as Prometheus/grafana.

BluSpoon@lemm.ee · 1 year ago

Personally I opt for zabbix. But I’ve been working with it for years. Simple deployment, lots of support, just works.

seba@lemmy.world · 8 months ago

Hi ! I’m also trying to navigate the monitoring solutions, I thought it will be easier … :( Maybe someone has a recommendation:

I’m looking for a lightweight tool for my personal home lab (Ionos VPS 2GB ram 2cpu), so no need for scalability or big data, etc. I’m experimenting with some services (syncthing, silverbullet-md, wireguard) and there is not much ram left for anything else. I’ve being reading about Prometheus+Grafana, but sounds like an overkill, like checkMk, Zabbix , Graphite, netdata…

I mostly need status of the hadware (ram+cpu) and containers running.Ideally, I can see an historical of a few days in a web based dashboard.

Currently I’m using Glances because it was easy to install and very lightweight but if I want to visualize the persisted data, I need something like Grafana, etc.

(sorry for the long comment I wasn’t sure if I should have to start a new post) Thanks a lot 🤓

pim@feddit.nl · 1 year ago

Currently setting up my own monitoring stack:

Fluentbit to gather metrics/logs
Fluentd to aggregate logs
Elasticsearch as database
Grafana for visualization

grahamsz@kbin.social · 1 year ago

Since i’m already running it otherwise, i’ve been running stuff through Home Assistant and using lovelace dashboards.

ptman@sopuli.xyz · 1 year ago

I’ve used nagios, check_mk, zabbix and currently using prometheus + grafana. I suggest prometheus + grafana. But you may want to use netdata as the exporter instead of node_exporter. Or both.

dr_robot@kbin.social · 1 year ago

Thanks for your reply! Out of curiosity, what made you go with Prometheus over zabbix and check_mk in the end? Those two seem to be heavily recommended.

ptman@sopuli.xyz · 1 year ago

nagios (and check_mk) are plain old tech. Newer ones have been built with lessons learned. zabbix I don’t like because configuration is in a database. prometheus is nice because it’s performant and configuration is in a file (which can be version controlled in git and deployed with e.g. ansible). Data in database, config in plain text files.

Dran@lemmy.world · 1 year ago

check_mk is what I use at home and at work, it’s a fork of nagios/icinga, works with agents, nagios plugins, or snmp, and if somehow you can’t find what you want to monitor, writing custom checks is as easy as writing a bash script

1 year ago

I’m also using checkmk and have been happy with it. I had been using zabbix prior but found it so be cumbersome and sluggish.