[tor-bugs] #26022 [Metrics/Statistics]: Fix a flaw in the noise-removing code in our onion service statistics
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed May 16 14:41:32 UTC 2018
#26022: Fix a flaw in the noise-removing code in our onion service statistics
--------------------------------+------------------------------
Reporter: karsten | Owner: metrics-team
Type: defect | Status: needs_review
Priority: Medium | Milestone:
Component: Metrics/Statistics | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
--------------------------------+------------------------------
Comment (by karsten):
Replying to [comment:12 amj703]:
> > We could sum up relay values first and then adjust the result.
However, we'd lose the ability to discard outliers, which we're doing
extensively with onion service statistics. After all, we're throwing out 2
times 25% of reported values which we'd then include again.
>
> Why not throw out the outliers, then add the remaining, then do the
adjustment?
The way we're determining whether a reported value was an outlier or not
is by extrapolating all reported values to network totals and discarding
the lowest 25% and highest 25% of ''extrapolated'' values. But
extrapolating values requires us to make these adjustment first, or we'd
extrapolate to the wrong network totals.
Here's another idea, though: what if we change the way how we're removing
noise by ''only'' subtracting `bin_size / 2` to undo the binning step as
good as we can and leave the Laplace noise alone. Basically, we'd only
account for the fact that relays always round up to the next multiple of
`bin_size`, but we wouldn't do anything about the positive or negative
noise. Of course, we'd keep the remaining extrapolation step and outlier
handling unchanged. Like this:
{{{
/** Removes noise from a reported stats value by subtracting half of the
bin size. */
private long removeNoise(long reportedNumber, long binSize) {
return reportedNumber - binSize / 2;
}
}}}
If this makes any sense, I could produce some numbers with this new, even
simpler approach.
> > Hang on. Relays always round ''up'' to the next multiple of
`bin_size`. So, everything in `(-bin_size, 0]` will be reported as `0` and
''not'' as `-bin_size`.
> >
> > > I don’t think the “right side” rounding is happening with current
use of the floor function, if it ever was. Maybe I’m wrong, but as I
understand it Math.floorDiv((reportedNumber + binSize / 2) will round
-0.75*binSize to -binSize.
> >
> > This part is correct. (The full "formula" is
`Math.floorDiv((reportedNumber + binSize / 2), binSize) * binSize`.)
>
> These statements appear inconsistent. Is everything in (-bin_size, 0]
rounded to 0, or is only [-bin_size/2,0] rounded to zero with [-bin_size,
-bin_size/2) rounded to -bin_size? I think it's the latter, because
Math.floorDiv((reportedNumber + binSize / 2), binSize) * binSize with
reportedNumber=-0.75*binSize should evaluate to
Math.floorDiv((-0.25*binSize), binSize) * binSize = -1 * binSize =
-binSize. That appears consistent with how you've described Math.floorDiv
and how the docs describe it at
<https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html>: "Returns
the largest (closest to positive infinity) int value that is less than or
equal to the algebraic quotient".
Wait, we're talking about two different things:
1. Relays internally round ''up'' to the next multiple of `bin_size`.
2. metrics-web contains that `removeNoise()` method that this ticket is
all about.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26022#comment:13>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list