[metrics-bugs] #26035 [Metrics/Statistics]: Streamline sample quantile types used in the various modules

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri May 11 19:37:15 UTC 2018


#26035: Streamline sample quantile types used in the various modules
--------------------------------+---------------------------
 Reporter:  karsten             |          Owner:  iwakeh
     Type:  enhancement         |         Status:  accepted
 Priority:  Medium              |      Milestone:
Component:  Metrics/Statistics  |        Version:
 Severity:  Normal              |     Resolution:
 Keywords:                      |  Actual Points:
Parent ID:                      |         Points:
 Reviewer:                      |        Sponsor:  Sponsor13
--------------------------------+---------------------------

Comment (by iwakeh):

 This turned out to be longer than intended:

 a) ​Advertised bandwidth distribution:
  * With the notation in comment:1 the Java code uses essentially `result =
 val(floor((N-1)*percentile))` [https://gitweb.torproject.org/metrics-
 web.git/tree/src/main/java/org/torproject/metrics/stats/advbwdist/Main.java#n124
 about here].
  * [https://gitweb.torproject.org/metrics-
 web.git/tree/src/main/R/advbwdist/aggregate.R#n13 R code] takes the median
 of the Java calculated percentiles.
 b) ​Advertised bandwidth of n-th fastest relays uses the resulting data
 from a).
 c) ​Fraction of connections used uni-/bidirectionally:
 [https://gitweb.torproject.org/metrics-
 web.git/tree/src/main/java/org/torproject/metrics/stats/connbidirect/Main.java#n468
 Uses Java] and calculates `result = val(floor(N*percentile))` for the
 three percentiles .25, .5, and .75.
 d) ​Time to download files over Tor: uses percentile_cont
 e) ​Unique .onion addresses (version 2 only):
 [https://gitweb.torproject.org/metrics-
 web.git/tree/src/main/java/org/torproject/metrics/stats/hidserv/Aggregator.java#n165
 The code] doesn't seem to calculate quartiles, rather checks that the
 interval is contained in the 25% to 75% interval of the fraction sum.
 Hmm, what am I missing here?
 f) Onion-service traffic (versions 2 and 3): same as e).

 In total, the Java calculations a) and c) use a discrete version of median
 calculation and differ in 'slight index shifting'.
 The calculations in e) and f) are not really 'quartiles'.
 The remaining calculations use R's median and postgresql percentile_cont,
 where R's standard median is calculated (in pseude code) as
 {{{
 #!C
 first = floor(percentile * N);
 second = ceil(percentile * N);
 /* if first==second the value of first is used. */
 if (first==second) {
   result = val(first);
 } else { /* if first and second differ, take the average. */
   result = val(first) + (0.5 * (val(second) - val(first)));
 }
 }}}

 So R's standard median is the same as 50%-percentile of type R-2 and also
 coincides with 50%-percentile of type R-7.

 There is a variety there.  The discrete types are easier to compute (when
 trying to reproduce the results for example).  Introducing the
 interpolation (or continuous) type in Java would mean to complicate the
 current code a little, but could be done w/o commons-math.
 Of course, the two calculations in a) and c) should be the same, but
 that's only a minor change and not related to the choice of percentile
 calculation.

 === I: R-1
 If we decide to use the discrete R-1 throughout.  If so, we'd need to
 * replace percentile_cont by percentile_dics, and
 * replace R's median function throughout by the 50%-percentile type R-1
 provided by utility function named `metricsmedian`.

 === II: R-7
 If we decide to use of the proportionate interpolation method R-7
 throughout, there are these work packages:

 * implement a simple utility interpolation function for Java (similar to
 postgresql's approach)
 * and make use of it in a) and c).
 * replace R's median function throughout by the 50%-percentile type R-7
 provided by  utility function named `metricsmedian`.


 In both cases the calculations e) and f) stay as they are, but need more
 documentation.


 ----
 PS: (Trucks often have a spare tire mounted somewhere, and in some
 countries they use three wheeled trucks ;-)

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26035#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list