[metrics-bugs] #28305 [Metrics/Statistics]: Include client numbers even if we think we got reports from more than 100% of all relays
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Nov 28 22:39:55 UTC 2018
#28305: Include client numbers even if we think we got reports from more than 100%
of all relays
--------------------------------+------------------------------
Reporter: karsten | Owner: karsten
Type: defect | Status: accepted
Priority: High | Milestone:
Component: Metrics/Statistics | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor: SponsorV-can
--------------------------------+------------------------------
Comment (by teor):
Replying to [comment:5 karsten]:
> I think I now know what's going on: some relays report written directory
byte statistics for times when they were hardly listed in consensuses.
>
> Here's a graph ...
>
> Note the red arrow. At this point `n(H)` grows larger than `n(N)`.
That's an issue. By definition, a relay cannot report written directory
bytes statistics for a longer time than it's online.
But relays that aren't listed in the consensus can still be acting as
relays.
Here are a few scenarios where that happens:
* the relay's IPv4 address is unreachable from a majority of directory
authorities, but some clients (with old consensuses) can still reach it:
* the relay's IPv4 address has changed, and the authorities haven't
checked the new address, but the relay is still reachable on the old
address cached at some clients
* the same scenarios with IPv6, but there are only 6/9 authorities that
check and vote on IPv6
* the relay is configured as a bridge by some clients, but it publishes
descriptors as a relay
If a relay drops in and out of the consensus every few hours, there will
always be some clients with a consensus containing that relay.
> I also looked at random relay `002B024E24A30F113982FCB17DFE05B6F38C0C79`
that had a larger `n(H)` value than `n(N)` value on 2018-10-28:
>
> - This relay was listed in 3 out of 24 consensuses on 2018-10-28
(19:00, 20:00, and 21:00). As a result, we count this relay with `n(N) =
10800` (we're using seconds internally, not hours).
> - The same relay published an extra-info descriptor on 2018-10-31 at
09:28:04 with the following line: `dirreq-write-history 2018-10-30
08:04:04 (86400 s) 0,0`. We count this as `n(H) = 57356` on 2018-10-28.
>
> A possible mitigation (other than the one I suggested above) could be to
replace `n(H)` with `n(N^H)` in the `frac` formula. This would mean that
we'd cap the amount of time for which a relay reported written directory
bytes to the amount of time it was listed in the consensus.
This seems like a reasonable approach: if the relay is listed in the
consensus for `n(N^H)` seconds, then we should weight its bandwidth using
that number of seconds.
> I'm currently dumping and downloading the database to try this out at
home. However, I'm afraid that deploying this fix is going to be much more
expensive than making the simple fix suggested above. I'll report here
what I find out.
I'm not sure if it will make much of a difference long-term: relays that
drop out of the consensus should have low bandwidth weights, and therefore
low bandwidths. (Except when the network is unstable, or there are less
than 3 bandwidth authorities.)
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/28305#comment:6>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list