[metrics-bugs] #25144 [Metrics/Statistics]: op-us onionperf instance spends much of its time at 100% timeout failure: why?
Tor Bug Tracker & Wiki
blackhole at torproject.org
Mon Jul 2 16:48:10 UTC 2018
#25144: op-us onionperf instance spends much of its time at 100% timeout failure:
why?
--------------------------------+------------------------------
Reporter: arma | Owner: metrics-team
Type: defect | Status: new
Priority: Medium | Milestone:
Component: Metrics/Statistics | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
--------------------------------+------------------------------
Comment (by irl):
It appeared to be running normally, I suspect it has a bug where it stops
working. The obvious answer, "the tgen server went away and there was
nothing to talk to", didn't seem to be the answer.
Improving monitoring of metrics services is on the roadmap. As a temporary
measure I can keep an eye on these CSV files to ensure that the service
looks like it is running. A medium-term goal may be to write a Nagios
plugin (relatively simple task) to check the latest CSV files for 100%
failure rates and alert on that. A longer-term goal would be to both track
down the cause of the failure, and to instrument OnionPerf to be able to
notify of such failures before the 24h cycle is complete.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25144#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list