[metrics-bugs] #26035 [Metrics/Statistics]: Streamline sample quantile types used in the various modules
Tor Bug Tracker & Wiki
blackhole at torproject.org
Sat May 19 19:36:34 UTC 2018
#26035: Streamline sample quantile types used in the various modules
--------------------------------+------------------------------
Reporter: karsten | Owner: karsten
Type: enhancement | Status: needs_review
Priority: High | Milestone:
Component: Metrics/Statistics | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor: Sponsor13
--------------------------------+------------------------------
Changes (by karsten):
* status: assigned => needs_review
Comment:
Replying to [comment:14 iwakeh]:
> Taking you up on your offer from comment:13, so I can concentrate on
reviews and tickets of CollecTor.
Alright, happy to implement this change.
Please review [https://gitweb.torproject.org/karsten/metrics-
web.git/log/?h=task-26035 my task-26035 branch] with three commits:
- [https://gitweb.torproject.org/karsten/metrics-
web.git/commit/?h=task-26035&id=4f92894a1ee5315b9e4a17b38f3cdb229612f0f1
4f92894] changes how we're computing median and inter-quartile range in
the censorship detector code, which is still written in Python. I tested
the change by running on our user number estimates. I found that it
changes 159 of 2447 days in our data (6.5%) and leaves the remaining days
entirely unchanged. This also makes sense: with a slightly different
median and inter-quartile range we either include a value or exclude it as
outlier. I'd say we cannot conclude that one of the implementations is
correct and the other is not. The new implementation will simply be more
consistent throughout our code base.
- [https://gitweb.torproject.org/karsten/metrics-
web.git/commit/?h=task-26035&id=2685c78f13cbf9402d5ba0b4380df03f246e86e5
2685c78] makes the same change to our advertised bandwidth statistics.
Obviously, this changes results a bit, because we're now interpolating
between actually reported advertised bandwidths rather than returning a
value that was actually reported by one of the relays. Still, for the sake
of consistency throughout our code base, we should switch.
- [https://gitweb.torproject.org/karsten/metrics-
web.git/commit/?h=task-26035&id=f9c24cab1006bf5999c662e9d06767c59c71a3e6
f9c24ca] makes the third change in this series, this time to the
connbidirect module. The change is quite significant in years 2011 and
2012 where we had just a handful of relays reporting these statistics.
Then it does make a difference whether we're interpolating or not. Same
argument in favor of doing it now.
I'm currently re-processing the descriptor archive for updated advbwdist
statistics (second commit above). Re-doing the clients and connbidirect
statistics using the updated code is much simpler. I hope to be ready to
deploy the change in the next few days. Ideally, we'd be done with the
review process by then, too. Thanks in advance!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26035#comment:15>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list