[metrics-bugs] #26022 [Metrics/Statistics]: Fix a flaw in the noise-removing code in our onion service statistics
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed May 16 07:50:13 UTC 2018
#26022: Fix a flaw in the noise-removing code in our onion service statistics
--------------------------------+------------------------------
Reporter: karsten | Owner: metrics-team
Type: defect | Status: needs_review
Priority: Medium | Milestone:
Component: Metrics/Statistics | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
--------------------------------+------------------------------
Comment (by karsten):
Replying to [comment:7 amj703]:
> Hey Karsten,
Hey Aaron,
> (My trac pseudonym is amj703 instead of ohmygodel. I've changed the cc).
Oops, my bad!
Thanks for your very quick response, after learning about this issue! :)
> I agree you did find a bug in how the noisy numbers are adjusted. The
change from integer division to floorDiv seems right to me.
Great, I will fix that then.
> One thing you might also consider doing to improve handling negative
values is to disallow them (i.e. round up to zero). This could be done by
the relay reporting its number. We probably discussed this option, and
maybe the reason it wasn't chosen is that it slightly biases counts so
their expected value isn't the true value (or at least the true rounded
value).
Hmm, I believe that disallowing negative values by rounding them up to
zero would bias the result unnecessarily. For example, imagine an extreme
case where we picked a large `bin_size` value and an even larger `delta_f`
value: most relays would observe values between 0 and `bin_size`, and they
would add very large positive or negative noise values. In such a case we
need both negative and positive values to come up with a result close to
0.
> If that's the case, and negative values are allowed to prevent biasing,
we should recognize that (1) values are already being biased because of
the rounding (for which we minimize the worst-case bin by adding
(-binSize/2) at the end), and (2) adding (-binSize/2) actually makes the
bias worse for negative bin values.
Hmm, I'm not so sure about this one, either. Remember that we implemented
the code at relays to report the ''right side'' of a bin, not the
''center''. Take another example where a relay observes exactly 0 events
over two days: on day 1 it adds a small positive noise value and reports
`bin_size` as number, and on day 2 it adds a small negative noise value
and reports 0 as number. If we subtract `bin_size / 2` from those reported
values, we'll end up with 0 on average, which would be correct. But if we
didn't, we'd end up with `bin_size / 2` as average, which seems wrong.
Does this make sense? If so, I think I'd prefer to leave the general
formula to remove noise unchanged and only focus on fixing the
implementation bug related to integer truncation.
Thanks again!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26022#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list