[tor-bugs] #13209 [Metrics/Analysis]: Write a hidden service hsdir health measurer
Tor Bug Tracker & Wiki
blackhole at torproject.org
Tue Dec 20 16:23:49 UTC 2016
#13209: Write a hidden service hsdir health measurer
Reporter: arma | Owner:
Type: project | Status: closed
Priority: High | Milestone:
Component: Metrics/Analysis | Version: Tor:
| 0.2.7
Severity: Normal | Resolution: fixed
Keywords: SponsorR, tor-hs, 027-triaged-1-in | Actual Points:
Parent ID: | Points:
| medium/large
Reviewer: | Sponsor:
Changes (by dgoulet):
* status: assigned => closed
* resolution: => fixed
* severity: => Normal
Ok I will close this ticket but before here are some conclusions about
this and possibly future work. I'm attaching to this ticket the raw result
taken from May 29th, 2015 to June 14th, 2016. You can find the CSV file
specification in https://gitlab.com/hs-health/hs-health/blob/master
This experiment showed us few things. With a client always using the
latest consensus, here are the results of the 6 stable .onion we've
monitored (output from analyze-csv.py).
Log health.csv period is from 29 May 2015 16:36:03 to 15 Jun 2016 00:08:15
(9175 hours)
--> 2.721% failed fetch (3958/145435).
On average once we fail to fetch once on a specific HSDir, the
descriptor was missing for 01:14:31 (4471 seconds).
[+] wlupld3ptjvsgwqw.onion
3.35% of failed fetch (913/27270) for an average time of 01:29:09
minutes (5349 seconds)
After first fail on an HSDir, we have 7.55 failed attempt(s) before
Churn happened 1.319% of the time (121 times)
[+] 3g2upl4pq6kufc4m.onion
1.80% of failed fetch (524/29099) for an average time of 00:50:19
minutes (3019 seconds)
After first fail on an HSDir, we have 3.94 failed attempt(s) before
Churn happened 1.450% of the time (133 times)
[+] agorahooawayyfoe.onion
5.07% of failed fetch (596/11744) for an average time of 01:21:02
minutes (4862 seconds)
After first fail on an HSDir, we have 6.77 failed attempt(s) before
Churn happened 0.959% of the time (88 times)
[+] 4cjw6cwpeaeppfqz.onion
3.11% of failed fetch (886/28495) for an average time of 01:28:32
minutes (5312 seconds)
After first fail on an HSDir, we have 7.38 failed attempt(s) before
Churn happened 1.308% of the time (120 times)
[+] zti6p7h6spbtx5xr.onion
3.05% of failed fetch (497/16289) for an average time of 01:18:47
minutes (4727 seconds)
After first fail on an HSDir, we have 6.54 failed attempt(s) before
Churn happened 0.828% of the time (76 times)
[+] facebookcorewwwi.onion
1.90% of failed fetch (542/28580) for an average time of 01:04:11
minutes (3851 seconds)
After first fail on an HSDir, we have 4.93 failed attempt(s) before
Churn happened 1.199% of the time (110 times)
As we can see, it's pretty stable. The churn rate is very low and _always_
only affect one single HSDir out of the set of 6 (see .csv result, it's
not printed in the output). On average, a client with latest consensus
will fail to fetch the descriptor on one HSDir out of the six ~2.71% of
the time.
The number of fetches varies because unfortunately the tool is not
entirely "stable" that is sometimes it crashed and for some period of time
we go without fetching some .onion while others are still running (python
threading is ... something....).
=== Conclusion ===
1. This experiment is not ideal as it '''only''' consider the latest
consensus on the client side which is not really the reality of things. An
improved version of this tool would basically run 12 clients with each
with a different hour consensus spanning over 12 hours. Then using that,
trying to fetch the descriptor and note down churn and failures.
2. One key aspect of this tool is that once a fail fetch happened, it went
into "recover mode" that is retrying every 15 minutes until the descriptor
can be fetched again thus giving us the interesting statistics of how many
failed attempt before success and how much time do I need to spend waiting
for until success. This gets a bit more complicated with clients with
different consensuses because they need to update their consensus at some
point over time and deciding which consensus to update to (latest or 2
hours in past or ...) might affect the results but also creates LOTS of
cases to test.
3. A simpler but I think better version of this tool would be to instead
of taking the latest consensus all the time, it should simply use the tor
client normal behavior and monitoring the .onion with it. However, the HS
client side behavior has changed over some tor stable version and might
change again so this should be made for each maintained tor version which
would also indicate to us any regression or performance improvement
between them.
4. Load on the network considerations. It is all fun and well but if we
decide to improve this tool (or rewrite a new one), we should consider how
much load it puts on the network. HS fetch aren't that heavy but if you
multiply this by 12 times 6 HSDir and then you run this every X minutes,
lets not forget what it can do to the Guard in front.
So all things considered, there is much more room for improvement with
this tool and the results could be useful to have on our metrics website
but we need to make it a bit more wise and at the very least change it
with 3.
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/13209#comment:14>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list