[tor-bugs] #29410 [Internal Services/Tor Sysadmin Team]: Can Prometheus help with multiple checks turning into one single alarm?

Wed Mar 6 23:07:06 UTC 2019

#29410: Can Prometheus help with multiple checks turning into one single alarm?
-------------------------------------------------+---------------------
 Reporter:  ln5                                  |          Owner:  tpa
     Type:  task                                 |         Status:  new
 Priority:  Medium                               |      Milestone:
Component:  Internal Services/Tor Sysadmin Team  |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:                                       |  Actual Points:
Parent ID:                                       |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+---------------------

Comment (by anarcat):

 it depends what you mean by "multiple checks" or "duplicates". prom's
 alerting system is designed to be highly available (HA) so even if
 multiple alerting nodes need to alert, the user will receive only one
 (thanks to $magic).

 but i suspect that if you have multiple slightly different checks, it
 *will* do multiple alerts. the key here, in my experience with nagios
 anyways, is to setup alerting dependencies so that if service A (e.g.
 http://tpo) fails because of service B (e.g. ICMP to tpo *or* ICMPv6 to
 tpo), you only get warned once (e.g. for service B). but it's tricky to
 setup, easy to get wrong, and i'm not sure prometheus support such
 dependency chains.

 I found prometheus to be somewhat lacking in monitoring: the HA design is
 good, but there's nothing like the vast diversity of service checks that
 nagios has. in particular, you can set threshold and there *are* many
 plugins to monitor a lot of things, but they don't come with predefined
 limits, like "90% disk usage or load of NCPU*2 is WARNING", so you need to
 define those on your own.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29410#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online