[metrics-bugs] #26022 [Metrics/Statistics]: Fix a flaw in the noise-removing code in our onion service statistics
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed May 9 13:34:13 UTC 2018
#26022: Fix a flaw in the noise-removing code in our onion service statistics
--------------------------------+------------------------------
Reporter: karsten | Owner: metrics-team
Type: defect | Status: needs_review
Priority: Medium | Milestone:
Component: Metrics/Statistics | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
--------------------------------+------------------------------
Comment (by karsten):
Replying to [comment:2 asn]:
> Hey Karsten,
Hi asn,
> nice research! It's been a while since we looked at this stuff, so here
are a
> few questions as I try to understand what's going on:
Thanks for looking into this! Those are good questions. Let me try to
answer them:
> a) I'm trying to understand the rounding step and it's significance. Do
you remember what we meant by ''The idea is that it's most likely that
noise was added to the closest right side of a bin than to the right side
of another bin.''?
The following is from my memory, not from looking at code:
I ''think'' that, on relays, we're first binning the observed number and
then adding noise. The noise distribution has a mean value of 0. So, even
though we don't know how much noise was added, it's more likely that a
small negative or small positive value was added.
> b) Why do we think that the result of that division should be floored
towards infinity? Is it because the graph looks weird?
Floored towards ''negative'' infinity. The idea is that we're always
subtracting something, rather than subtracting in case of positive numbers
and adding in case of negative numbers. We're basically handling reported
values spanning two bins exactly the same, just because they happen to be
close to zero. This seems wrong.
(To be honest, this is the first time that I'm even thinking about integer
truncation going into two different "directions" depending on whether the
input value is positive or negative. I could imagine that none of us had
thought about this before.)
> c) How did you produce that graph? I agree it doesn't look good.
By generating sample values and feeding them into the code that removes
noise. That's the "flawed" graph. And then another time into the fixed
code which produces the "fixed" graph.
> d) You say ''Right now, we're handling all negative reported values
wrong by adding 1 binSize to them which we shouldn't do.''. Where do we do
that, and why?
Take the "fixed" graph and imagine you add 1024 to every value < -512.
Then you have the "flawed" graph. I didn't mean to say that we're doing
that for a reason. I meant that this is what we're doing right now,
unknowingly, and that I believe that it's wrong.
> Thanks! :)
Thanks for digging into this!
If the answers above make sense and you agree that this is a possible bug,
I'll try to produce numbers using the fixed method.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26022#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list