[tor-bugs] #29410 [Internal Services/Tor Sysadmin Team]: Can Prometheus help with multiple checks turning into one single alarm?
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Mar 6 23:07:06 UTC 2019
#29410: Can Prometheus help with multiple checks turning into one single alarm?
-------------------------------------------------+---------------------
Reporter: ln5 | Owner: tpa
Type: task | Status: new
Priority: Medium | Milestone:
Component: Internal Services/Tor Sysadmin Team | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------------------------------+---------------------
Comment (by anarcat):
it depends what you mean by "multiple checks" or "duplicates". prom's
alerting system is designed to be highly available (HA) so even if
multiple alerting nodes need to alert, the user will receive only one
(thanks to $magic).
but i suspect that if you have multiple slightly different checks, it
*will* do multiple alerts. the key here, in my experience with nagios
anyways, is to setup alerting dependencies so that if service A (e.g.
http://tpo) fails because of service B (e.g. ICMP to tpo *or* ICMPv6 to
tpo), you only get warned once (e.g. for service B). but it's tricky to
setup, easy to get wrong, and i'm not sure prometheus support such
dependency chains.
I found prometheus to be somewhat lacking in monitoring: the HA design is
good, but there's nothing like the vast diversity of service checks that
nagios has. in particular, you can set threshold and there *are* many
plugins to monitor a lot of things, but they don't come with predefined
limits, like "90% disk usage or load of NCPU*2 is WARNING", so you need to
define those on your own.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29410#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list