[metrics-bugs] #26022 [Metrics/Statistics]: Fix a flaw in the noise-removing code in our onion service statistics
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed May 16 13:08:45 UTC 2018
#26022: Fix a flaw in the noise-removing code in our onion service statistics
--------------------------------+------------------------------
Reporter: karsten | Owner: metrics-team
Type: defect | Status: needs_review
Priority: Medium | Milestone:
Component: Metrics/Statistics | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
--------------------------------+------------------------------
Comment (by amj703):
> Hmm, I believe that disallowing negative values by rounding them up to
zero would bias the result unnecessarily. For example, imagine an extreme
case where we picked a large `bin_size` value and an even larger `delta_f`
value: most relays would observe values between 0 and `bin_size`, and they
would add very large positive or negative noise values. In such a case we
need both negative and positive values to come up with a result close to
0.
Sure, that makes sense as a reason to try and produce unbiased estimates
for each relay’s value. It seems to me that there is another way to do
this, which would involve adding the relay values first and then adjusting
the result, given that the noise has already been summed and thus has zero
bias (aka an expectation of zero). But the current method seems to be
working just fine!
> Hmm, I'm not so sure about this one, either. Remember that we
implemented the code at relays to report the ''right side'' of a bin, not
the ''center''. Take another example where a relay observes exactly 0
events over two days: on day 1 it adds a small positive noise value and
reports `bin_size` as number, and on day 2 it adds a small negative noise
value and reports 0 as number. If we subtract `bin_size / 2` from those
reported values, we'll end up with 0 on average, which would be correct.
But if we didn't, we'd end up with `bin_size / 2` as average, which seems
wrong.
As another example, consider that on day 1 a positive noise value in
(bin_size/2, bin_size] is added, which results in bin_size being reported,
and on day 2 a large-magnitude negative value in [-bin_size, -bin_size/2)
is reported, which results in -bin_size being reported. Then subtracting
bin_size/2 from those reported values ends up with -bin_size/2 as the
average, which seems wrong. I don’t think the “right side” rounding is
happening with current use of the floor function, if it ever was. Maybe
I’m wrong, but as I understand it Math.floorDiv((reportedNumber + binSize
/ 2) will round -0.75*binSize to -binSize.
> Does this make sense? If so, I think I'd prefer to leave the general
formula to remove noise unchanged and only focus on fixing the
implementation bug related to integer truncation.
It definitely makes sense to just focus on the immediate issue discovered.
I was just thinking through how this was supposed to work again and
thought I would let you know how it appeared to me.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26022#comment:10>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list