[metrics-bugs] #26035 [Metrics/Statistics]: Streamline sample quantile types used in the various modules

Tor Bug Tracker & Wiki blackhole at torproject.org
Sat May 19 19:36:34 UTC 2018


#26035: Streamline sample quantile types used in the various modules
--------------------------------+------------------------------
 Reporter:  karsten             |          Owner:  karsten
     Type:  enhancement         |         Status:  needs_review
 Priority:  High                |      Milestone:
Component:  Metrics/Statistics  |        Version:
 Severity:  Normal              |     Resolution:
 Keywords:                      |  Actual Points:
Parent ID:                      |         Points:
 Reviewer:                      |        Sponsor:  Sponsor13
--------------------------------+------------------------------
Changes (by karsten):

 * status:  assigned => needs_review


Comment:

 Replying to [comment:14 iwakeh]:
 > Taking you up on your offer from comment:13, so I can concentrate on
 reviews and tickets of CollecTor.

 Alright, happy to implement this change.

 Please review [https://gitweb.torproject.org/karsten/metrics-
 web.git/log/?h=task-26035 my task-26035 branch] with three commits:

  - [https://gitweb.torproject.org/karsten/metrics-
 web.git/commit/?h=task-26035&id=4f92894a1ee5315b9e4a17b38f3cdb229612f0f1
 4f92894] changes how we're computing median and inter-quartile range in
 the censorship detector code, which is still written in Python. I tested
 the change by running on our user number estimates. I found that it
 changes 159 of 2447 days in our data (6.5%) and leaves the remaining days
 entirely unchanged. This also makes sense: with a slightly different
 median and inter-quartile range we either include a value or exclude it as
 outlier. I'd say we cannot conclude that one of the implementations is
 correct and the other is not. The new implementation will simply be more
 consistent throughout our code base.

  - [https://gitweb.torproject.org/karsten/metrics-
 web.git/commit/?h=task-26035&id=2685c78f13cbf9402d5ba0b4380df03f246e86e5
 2685c78] makes the same change to our advertised bandwidth statistics.
 Obviously, this changes results a bit, because we're now interpolating
 between actually reported advertised bandwidths rather than returning a
 value that was actually reported by one of the relays. Still, for the sake
 of consistency throughout our code base, we should switch.

  - [https://gitweb.torproject.org/karsten/metrics-
 web.git/commit/?h=task-26035&id=f9c24cab1006bf5999c662e9d06767c59c71a3e6
 f9c24ca] makes the third change in this series, this time to the
 connbidirect module. The change is quite significant in years 2011 and
 2012 where we had just a handful of relays reporting these statistics.
 Then it does make a difference whether we're interpolating or not. Same
 argument in favor of doing it now.

 I'm currently re-processing the descriptor archive for updated advbwdist
 statistics (second commit above). Re-doing the clients and connbidirect
 statistics using the updated code is much simpler. I hope to be ready to
 deploy the change in the next few days. Ideally, we'd be done with the
 review process by then, too. Thanks in advance!

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26035#comment:15>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list