[metrics-bugs] #28305 [Metrics/Statistics]: Include client numbers even if we think we got reports from more than 100% of all relays
Tor Bug Tracker & Wiki
blackhole at torproject.org
Sun Nov 4 23:58:53 UTC 2018
#28305: Include client numbers even if we think we got reports from more than 100%
of all relays
--------------------------------+------------------------------
Reporter: karsten | Owner: metrics-team
Type: defect | Status: new
Priority: High | Milestone:
Component: Metrics/Statistics | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor: SponsorV-can
--------------------------------+------------------------------
Changes (by teor):
* sponsor: => SponsorV-can
Comment:
Replying to [comment:2 karsten]:
> You'll find a description/specification how frac is calculate here:
https://metrics.torproject.org/reproducible-metrics.html#relay-users
>
> Maybe rounding error was not the right term. In fact, I believe it might
be a situation like the one you're describing. I can extract the variable
values going into the frac formula; maybe one of them is responsible for
getting us above the 100%.
I wonder if changing the bandwidth interval to 24 hours revealed this
issue?
For servers which report 24 hour intervals, I think that:
{{{
h(R^H) is usually equal to h(H)
n(H) is usually 24
n(R\H) is usually 0
n(N) can be slightly less than 24, if a relay was unreachable or
misconfigured, but didn't go down
Therefore, frac can be slightly more than 1.
}}}
> However, we should carefully consider whether we want to change that
formula or rather not touch it until we have PrivCount as replacement. If
we think the frac value isn't going to grow much beyond 100%, we could
just accept that inaccuracy and live with it. If we think it's going to
grow towards, say, 150%, I agree that we'll have to do something.
I think a similar analysis applies to PrivCount: if a relay is up for the
whole day, then it will report statistics using PrivCount. But if that
relay is dropped from some consensuses due to reachability, then our idea
of the average number of running relays will be too low.
We won't see this bug until almost all relays are running PrivCount. But
let's avoid re-implementing this bug in PrivCount if we can.
What can PrivCount do to avoid introducing a similar bug?
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/28305#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list