[metrics-bugs] #28137 [Metrics/Statistics]: Modify "Total consensus weights across bandwidth authorities" graph to only include relays that end up in the consensus
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Nov 22 11:47:04 UTC 2018
#28137: Modify "Total consensus weights across bandwidth authorities" graph to only
include relays that end up in the consensus
--------------------------------+------------------------------
Reporter: karsten | Owner: metrics-team
Type: enhancement | Status: new
Priority: Medium | Milestone:
Component: Metrics/Statistics | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
--------------------------------+------------------------------
Comment (by teor):
Replying to [comment:5 karsten]:
> Alright, I implemented the idea above.
>
> However, it turns out that matching all vote entries with all consensus
entries ''cannot'' be done with reasonable effort, at least not with the
current tools we use. For example, processing 3 days of descriptors takes
quite reasonable 5 minutes, but processing 3 weeks of descriptors already
takes almost 3 hours. This simply doesn't scale to 3 months or 3 years.
So the process scales non-linearly?
When processing 3 days, each hour of consensus and votes takes about 4
seconds.
But when processing 3 weeks, each hour of consensus and votes takes 21
seconds.
Can you do each consensus separately?
More precisely:
For each consensus, in a set of temporary tables:
* We import all fingerprints in a consensus together with the votes
referenced from the consensus.
* We import fingerprints into a table and assign numeric identifiers that
we use in other tables.
* We import all fingerprints in a vote together with a way to refer to the
consensus coming out of it.
* When aggregating, we join votes with the consensus they refer to, then
persist relevant data in permanent tables, with permanent identifiers.
* After aggregating, we delete all votes that we aggregated in the
previous step and we delete all consensuses if we aggregated all votes
referenced from consensuses.
* If there is any data left, we persist that data in permanent tables,
with permanent identifiers.
If this is going to take a lot of effort, then don't worry about it: the
difference isn't important in this case.
> We do have these 3 weeks from my tests though, so let's look at the
results:
> ...
> The red line is what's currently on the Tor Metrics website: it contains
measured bandwidths of all relays in a vote, regardless of whether a relay
made it into the consensus. The blue line only contains those relays in a
vote that also appeared in the consensus. I'd say that the difference is
almost negligible.
I agree: we could account for it with some documentation.
> What I'd like to try out is add a third line "Running in vote", which
would at least kick out relays in a vote that the authority didn't find to
be running. I'd expect that line to show up between red and blue. However,
a relay that doesn't have the Running flag in one vote can still go into
the consensus if the others think it's running. And a relay that has the
Running flag from one authority can still not show up in the consensus if
the others disagree. So, I'm unclear whether this really helps. Worth
trying, and a much smaller change, because it doesn't require us to match
vote entries with consensus entries.
Let's see the results for Running, then decide.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/28137#comment:6>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list