[metrics-bugs] #26022 [Metrics/Statistics]: Fix a flaw in the noise-removing code in our onion service statistics
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed May 16 14:00:04 UTC 2018
#26022: Fix a flaw in the noise-removing code in our onion service statistics
--------------------------------+------------------------------
Reporter: karsten | Owner: metrics-team
Type: defect | Status: needs_review
Priority: Medium | Milestone:
Component: Metrics/Statistics | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
--------------------------------+------------------------------
Comment (by karsten):
Replying to [comment:10 amj703]:
> > Hmm, I believe that disallowing negative values by rounding them up to
zero would bias the result unnecessarily. For example, imagine an extreme
case where we picked a large `bin_size` value and an even larger `delta_f`
value: most relays would observe values between 0 and `bin_size`, and they
would add very large positive or negative noise values. In such a case we
need both negative and positive values to come up with a result close to
0.
>
> Sure, that makes sense as a reason to try and produce unbiased estimates
for each relay’s value. It seems to me that there is another way to do
this, which would involve adding the relay values first and then adjusting
the result, given that the noise has already been summed and thus has zero
bias (aka an expectation of zero). But the current method seems to be
working just fine!
We could sum up relay values first and then adjust the result. However,
we'd lose the ability to discard outliers, which we're doing extensively
with onion service statistics. After all, we're throwing out 2 times 25%
of reported values which we'd then include again.
> > Hmm, I'm not so sure about this one, either. Remember that we
implemented the code at relays to report the ''right side'' of a bin, not
the ''center''. Take another example where a relay observes exactly 0
events over two days: on day 1 it adds a small positive noise value and
reports `bin_size` as number, and on day 2 it adds a small negative noise
value and reports 0 as number. If we subtract `bin_size / 2` from those
reported values, we'll end up with 0 on average, which would be correct.
But if we didn't, we'd end up with `bin_size / 2` as average, which seems
wrong.
>
> As another example, consider that on day 1 a positive noise value in
(bin_size/2, bin_size] is added, which results in bin_size being reported,
and on day 2 a large-magnitude negative value in [-bin_size, -bin_size/2)
is reported, which results in -bin_size being reported. Then subtracting
bin_size/2 from those reported values ends up with -bin_size/2 as the
average, which seems wrong.
Hang on. Relays always round ''up'' to the next multiple of `bin_size`.
So, everything in `(-bin_size, 0]` will be reported as `0` and ''not'' as
`-bin_size`.
> I don’t think the “right side” rounding is happening with current use of
the floor function, if it ever was. Maybe I’m wrong, but as I understand
it Math.floorDiv((reportedNumber + binSize / 2) will round -0.75*binSize
to -binSize.
This part is correct. (The full "formula" is
`Math.floorDiv((reportedNumber + binSize / 2), binSize) * binSize`.)
> > Does this make sense? If so, I think I'd prefer to leave the general
formula to remove noise unchanged and only focus on fixing the
implementation bug related to integer truncation.
>
> It definitely makes sense to just focus on the immediate issue
discovered. I was just thinking through how this was supposed to work
again and thought I would let you know how it appeared to me.
Sure! It's good to revisit our design decisions from a few years back and
to fix any other flaws coming up.
So, if the truncate/floor bug remains the only flaw in our current code,
and when the reprocessed numbers are finally done here, I'll merge and
update the numbers.
Thanks so much for sharing your thoughts here!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26022#comment:11>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list