[metrics-bugs] #26035 [Metrics/Statistics]: Streamline sample quantile types used in the various modules
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed May 16 16:43:06 UTC 2018
#26035: Streamline sample quantile types used in the various modules
--------------------------------+---------------------------
Reporter: karsten | Owner: iwakeh
Type: enhancement | Status: accepted
Priority: High | Milestone:
Component: Metrics/Statistics | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor: Sponsor13
--------------------------------+---------------------------
Comment (by iwakeh):
Replying to [comment:5 karsten]:
Leaving out all commons-math related, b/c we agreed to use it.
> Thanks, very useful! Let me first try to answer the open questions:
>
> - What's up with a) and c) using slightly different percentile
implementations? The reason is that we're including the 0th (minimum) and
100th percentile (maximum) in a) which we're not in c). It's totally
possible that what we're using right now for a) is a terrible hack. Maybe
we should instead use the formula for c) in a) and handle percentile 0 or
100 as a special case. Whatever the other implementations do.
Well, c) would fail on `1.0`, but that wouldn't occur b/c only quartiles
are computed. This ought to be fixed and both implementations will be the
same except for the edge cases.
>
> - What's up with e) and f) not being quartiles? What we're doing there
is that we're computing the ''weighted'' quartiles. And again, it might be
that it's a hack that we should rewrite. The goal should be to implement a
weighted trimmed mean. The technical report probably has a better
definition. What we cannot do, though, is use the exact same percentile
definition as we're using for the other places.
Well, I wouldn't call .25 times a value (fraction sum in the code) a
quartile, and the code calculates the weighed mean of all intervals
contained in `[sumFraction * 0.25, sumFraction * 0.75]`. So, nothing to
be done here.
>
> ...
> I'm slightly leaning towards R-7 here.
I don't feel strongly about this.
>
> ...
> Except for Java where we'd have to implement something ourselves, which
would also have to handle special cases 0 and 100.
Yes the minimum and maximum need to be coded.
>
> ...
> P.S.: Did I write something about trucks? I meant insect legs! Unless
those have a spare leg mounted somewhere, too, in which case I'll think
even harder about a good example. ;)
Well, for insects the leg number is fixed to six, unless they loose a leg
and live on later. Might be best to stick to the values at hand ;-)
So, I implement the changes decided in this and the previous comments.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26035#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list