[metrics-bugs] #26015 [Metrics/Statistics]: Remove inconsistency between bandwidth history graphs
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu May 3 14:11:26 UTC 2018
#26015: Remove inconsistency between bandwidth history graphs
------------------------------------+----------------------
Reporter: karsten | Owner: karsten
Type: enhancement | Status: assigned
Priority: Medium | Milestone:
Component: Metrics/Statistics | Version:
Severity: Normal | Keywords:
Actual Points: | Parent ID:
Points: | Reviewer:
Sponsor: |
------------------------------------+----------------------
Today I found an inconsistency between our various bandwidth history
graphs:
- The [https://metrics.torproject.org/bandwidth.html Total relay
bandwidth] graph shows the sum of all bandwidth histories that we can find
for a given day, whereas
- the [https://metrics.torproject.org/bandwidth-flags.html Advertised and
consumed bandwidth by relay flag] and [https://metrics.torproject.org
/bwhist-flags.html Consumed bandwidth by Exit/Guard flag combination]
graphs only show bandwidth histories of relays that we found in at least
one consensus on a day.
The reason is that we're only matching consensuses and extra-info
descriptors for the second and third graph, but not for the first. And we
need to do that in order to break down totals by guards/exits.
While it may seem simpler to just skip that matching step in the first
graph, it leads to inconsistent data. Consider the following data taken
from `bandwidth.csv`:
|| date || isexit || isguard || advbw || bwread || bwwrite || dirread ||
dirwrite ||
|| 2018-03-14 || f || f || 6757900851 || 2454444601 || 2493893288 || ||
||
|| 2018-03-14 || f || t || 15218678985 || 7024742679 || 7191640536 || ||
||
|| 2018-03-14 || t || f || 1592294787 || 562694042 || 558274048 || || ||
|| 2018-03-14 || t || t || 6189896122 || 3322794675 || 3356316394 || ||
||
|| 2018-03-14 || || || 29758770745 || 13367602689 || 13603416291 ||
6877369 || 187770410 ||
In theory, the sum of the first four rows should match the fifth row,
modulo rounding errors.
This works for advertised bandwidth (which is based on server descriptor
data, not extra-info descriptors). But it does not work for bandwidth
histories:
{{{
6757900851 + 15218678985 + 1592294787 + 6189896122 - 29758770745 = 0
2454444601 + 7024742679 + 562694042 + 3322794675 - 13367602689 = -2926692
2493893288 + 7191640536 + 558274048 + 3356316394 - 13603416291 = -3292025
}}}
The difference comes from relays that reported bandwidth histories but
that the directory authorities did not list as running.
Suggestion: we simply omit the bandwidth totals for cases where we have
values by exit/guard flags:
|| date || isexit || isguard || advbw || bwread || bwwrite || dirread ||
dirwrite ||
|| 2018-03-14 || f || f || 6757900851 || 2454444601 || 2493893288 || ||
||
|| 2018-03-14 || f || t || 15218678985 || 7024742679 || 7191640536 || ||
||
|| 2018-03-14 || t || f || 1592294787 || 562694042 || 558274048 || || ||
|| 2018-03-14 || t || t || 6189896122 || 3322794675 || 3356316394 || ||
||
|| 2018-03-14 || || || ~~29758770745~~ || ~~13367602689~~ ||
~~13603416291~~ || 6877369 || 187770410 ||
We'd remove an inconsistency by doing so, and we'd remove some code. The
graphing code would have to do one more step to aggregate data from four
rows, but that's not critical.
If this sounds reasonable to others, I'll prepare a patch. Setting to
needs_review for the idea, not for code.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26015>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list