What do y’all use to monitor many linux servers?

shootwhatsmyname@lemm.ee · 1 day ago

What do y’all use to monitor many linux servers?

ddh@lemmy.sdf.org · 1 hour ago

I use my family. It has a simple volume based alert for when services are offline.

vfsh@lemmy.blahaj.zone · 1 hour ago

It’ll even automatically configured variable alert volumes corresponding to the importance of the service!

utopiah · 6 hours ago

send alerts via http request

On this specifically you might want to check ntfy as it’s quite easy to setup and can give you notifications on pretty much any device (including iOS) via your own infrastructure all the way down to basics e.g. SSE. That mean you can subscribe to a topic, e.g. servers per physical location, alert level, etc and only get the ones you need.

utopiah · 6 hours ago

Node exporter, Prometheus and grafana

Otherwise much heavier but that’s also what I use.

MrPoopyButthole@lemmy.dbzer0.com · 5 hours ago

same

elucubra@sopuli.xyz · 5 hours ago

Ages ago I used to use Webmin. I have no clue as how it stacks up to others nowadays.

Andromxda 🇺🇦🇵🇸🇹🇼@lemmy.dbzer0.com · edit-2 14 hours ago

Netdata is exactly what you’re looking for. It’s basically an all in one monitoring and and alerting suite that collects and analyzes data, and provides a gorgeous web dashboard for you to view.

You can also manually replicate this using Prometheus, Grafana and other tools, but that requires a much bigger effort to set up.

Edit: There’s a public demo instance where you can try everything out: https://frankfurt.netdata.rocks/

ikidd@lemmy.world · 11 hours ago

I think they went to 5 nodes max on the free version as of the last patch. That’s damn near useless.

Phoenixz@lemmy.ca · edit-2 17 hours ago

We just recently started using zabbix. Open source and has a web interface to get a central view that can be accessed from wherever we allow it.

So far it’s been great but er have had little time and so far have used only 1% of what it can do

Still, I’d recommend it. Super easy to install, seems light weight, has clients for any os you’d need, can send out alerts (we currently use pushover for that)

Mora@pawb.social · 22 hours ago

Beszel. Probably the easiest tool of all the mentioned in this thread.

https://github.com/henrygd/beszel

JustARegularNerd@aussie.zone · 20 hours ago

Seconded. My only complaint (which this might already be a feature I haven’t found yet) is it doesn’t seem to support multiple drives. But yes, it is shit easy to set up and has a beautiful UI

Mora@pawb.social · 19 hours ago

Totally possible:

https://beszel.dev/guide/additional-disks

JustARegularNerd@aussie.zone · 16 hours ago

I no longer have any complaints about Beszel. Thank you!

tath@social.tath.link · 22 hours ago

Zabbix is pretty quick and easy. Many different services built in for sending notifications, along with your own custom (including webhooks). Fully customizable dashboard as well so you can add whatever you want/need at a glance.

8adger@lemmy.world · 22 hours ago

I stopped by to say the same thing. I use Zabbix to monitor everything

reisub@discuss.tchncs.de · 1 day ago

Node exporter, Prometheus and grafana

MrPoopyButthole@lemmy.dbzer0.com · 5 hours ago

this is the way

dann [any]@hexbear.net · 7 hours ago

This

eldereko@lemmy.dbzer0.com · 22 hours ago

telegraf, influxdb, grafana, and gatus

iii@mander.xyz · edit-2 1 day ago

uptime-kuma is what I use

spicehoarder@lemm.ee · 18 hours ago

Not exactly what you’re looking for, but I like using proxmox

loganb@lemmy.world · 1 day ago

I personally use CheckMK.

Offer a free “Raw” version.
Can be deployed with docker.
OSS

One thing is that it can be a lot to take in at first and took me a while to get used to it.

corsicanguppy@lemmy.ca · 10 hours ago

CheckMk user here via omd.

I’m looking for something else after the upgrade.

Black interface isn’t pretty for me and the old interface was “meh too hard so we ditched it”.
One half of the project split has a shit supply chain and just doesn’t meet the bar for upgrade requirements.
The other half of the project split is a mess to config in an automated desired-state setup. It’s all edge-triggered manual bullshit. NO. ENOUGH.

I miss 1.2 .

hobbsc@lemmy.sdf.org · 14 hours ago

checkmk user here. i can second the adjustment phase. i tend to ignore my servers but when something goes sideways it’s awesome to have checkmk’s structure in place.

RegalPotoo@lemmy.world · 1 day ago

Base ansible role installs Prometheus node exporter, configured with the text file collector
VM automations push DNS records so that the Prometheus dns-sd automatically discovers them
Ansible roles for add Cron jobs that generate metrics for specific systems and dump them for the text file collector
Grafana for dashboards
Karma as a UI in front of Prometheus alert manager

tetris11 · 1 day ago

Cron jobs that generate metrics for specific systems and dump them for the text file collector

Details please

RegalPotoo@lemmy.world · 17 hours ago

https://github.com/prometheus/node_exporter?tab=readme-ov-file#textfile-collector - which makes node exporter watch a specific directory for files that contain metrics, then re-export them back to the central Prometheus server
Some systems have their own metrics endpoints - instead of getting Prometheus to scrape these directly I set up a Cron job to curl these into files for node exporter - this means I don’t need extra config in Prometheus to find the endpoints, and don’t need to mess with firewall rules
Other systems don’t directly expose metrics in a format Prometheus can use - in this case I will write/find a script that can do the conversation, then either set it up to write the metrics file directly and run it on a Cron, or run it as a service and another Cron job to do the scrape

LainTrain@lemmy.dbzer0.com · 1 day ago

Cockpit.

corsicanguppy@lemmy.ca · 10 hours ago

My cockpit experience has been unilaterally dreadful. I’m glad you’re getting value out of it.

LainTrain@lemmy.dbzer0.com · 4 hours ago

How comes?

hobbsc@lemmy.sdf.org · 14 hours ago

is cockpit on a server by server basis or can you monitor multiple servers with it?

dkc@lemmy.world · 23 hours ago

I’ve been really enjoying Cockpit as well.

notabot@lemm.ee · 1 day ago

Nagios. It does depend on what you mean by monitor though. Nagios is good at telling you that “service A on host B” is down" but less useful for looking at things like performance trends. I particularly like being able to setup dependencies between services, so I get the alert for the root cause, and not all of the services that have gone down because of it.