[metrics-bugs] #23367 [Metrics/Statistics]: Onion address counts ignore descriptor upload overlap
Tor Bug Tracker & Wiki
blackhole at torproject.org
Tue Jan 14 12:06:57 UTC 2020
#23367: Onion address counts ignore descriptor upload overlap
--------------------------------+------------------------------
Reporter: teor | Owner: metrics-team
Type: defect | Status: needs_review
Priority: Medium | Milestone:
Component: Metrics/Statistics | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: #23126 | Points:
Reviewer: | Sponsor:
--------------------------------+------------------------------
Changes (by karsten):
* status: new => needs_review
* keywords: metrics-2018 =>
Comment:
Finally, I got it. (I didn't think the whole 2 years about this, but when
I started looking at this ticket again this morning it took me a while to
understand the bug...)
The situation is slightly different from your description, because
statistics are not collected from 00:00 UTC but from whenever a relay
starts collecting them. Your general statement that we're accounting for
descriptor upload overlap wrong is correct, though.
My current thought is to document this inaccuracy rather than changing the
code. It's a known inaccuracy of roughly 1/24 = 4.2% of absolute numbers.
But it doesn't affect relative changes over time. I don't think that
changing the code and reprocessing the statistics is worth the effort,
also regarding explaining why the numbers have changed now.
Here's how we could document this on the [https://metrics.torproject.org
/reproducible-metrics.html#onion-services Reproducible Metrics] page:
''As an approximation, we assume that an onion service publishes its
descriptor to twelve directories over a 24-hour period: the service stores
two replicas per descriptor using different descriptor identifiers, both
descriptor replicas get stored to three different onion-service
directories each, and the service changes descriptor identifiers once
every 24 hours which leads to two different descriptor identifiers per
replica.''
''To be clear, this approximation is not entirely accurate. For example,
'''the descriptors of roughly 1/24 of services are seen by 3 rather than 2
sets of onion-service directories, when a service changes descriptor
identifiers once at the beginning of a relay's statistics interval and
once again towards the end. In some cases,''' the two replicas or the
descriptors with changed descriptor identifiers could have been stored to
the same directory. As another example, onion-service directories might
have joined or left the network and other directories might have become
responsible for storing a descriptor which also include that .onion
address in their statistics. However, for the subsequent analysis, we
assume that neither of these cases affects results substantially.''
What do you think about this change?
I also agree that we should keep this in mind when we work on v3 stats. We
should keep this ticket open, turn it into an enhancement, and update the
summary a bit to make it clear that the remaining work is just for v3.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/23367#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list