[metrics-bugs] #33941 [Internal Services/Tor Sysadmin Team]: Nagios checks for op-??.onionperf.torproject.net

Tor Bug Tracker & Wiki blackhole at torproject.org
Tue Apr 28 09:51:21 UTC 2020


#33941: Nagios checks for op-??.onionperf.torproject.net
-------------------------------------------------+---------------------
 Reporter:  karsten                              |          Owner:  tpa
     Type:  task                                 |         Status:  new
 Priority:  Medium                               |      Milestone:
Component:  Internal Services/Tor Sysadmin Team  |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:                                       |  Actual Points:
Parent ID:                                       |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+---------------------

Comment (by karsten):

 Okay. This is less about my personal preference for Nagios or against
 Prometheus. It's more about getting something very simple deployed for
 monitoring our OnionPerf instances in the next few days. If you prefer
 Prometheus for doing that, I'm fine with that. I hope that I don't have to
 learn much about Prometheus but can treat it as a black box that runs a
 application-specific check script and sends me an alert if something's
 broken. To be honest, that's also how I treat Nagios. Ultimately, this
 should be your decision. I'm just bringing in the soft requirement to have
 three running checks for `op-{nl,us,hk}2.onionperf.torproject.net` by the
 end of the month. If that's an impossible requirement I'll have to make
 new plans about keeping an AWS instance alive that I'd prefer to
 terminate.

 In the meantime I worked a bit on the log-file-downloading idea and came
 up with a slightly optimized plan: each OnionPerf instance could update a
 status file once per minute that it makes available via its web server,
 and Nagios or Prometheus could process that file and alert if something's
 off. That file could simply contain the latest ISO-8601 timestamp when
 OnionPerf found itself to be fully operational, like
 `2020-04-28T09:31:19Z`. Nagios or Prometheus would then send out an alert
 if that file cannot be downloaded or the contained timestamp cannot be
 parsed or is older than one hour. How does this sound?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33941#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list