[metrics-bugs] #28328 [Metrics/Website]: inlcude "total consensus" in vote totals graph
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Nov 7 05:46:34 UTC 2018
#28328: inlcude "total consensus" in vote totals graph
-----------------------------+------------------------------
Reporter: starlight | Owner: metrics-team
Type: enhancement | Status: new
Priority: Medium | Milestone:
Component: Metrics/Website | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-----------------------------+------------------------------
Comment (by starlight):
Replying to [comment:6 teor]:
> I'm not sure I understand the problem, or its likely cause. I am cc'ing
Mike, because he has more experience with bandwidth weighting.
>
> I'm going to ask some questions to work out what is happening. I find
big blocks of text confusing, so it would help me if you'd answer after
each question.
>
> Replying to [ticket:28328 starlight]:
> > Totals of consensus weighs shift erratically due to some aspect of
vote median behavior in the consensus. E.g. (Exit,Exit+Guard) moved 12.5%
in 12 hours on 09-Jul-18 12:00 to 23:59 UTC while votes steady.
>
> The consensus is created deterministically from the votes. If the votes
are identical, the consensus will be identical. In particular, the
consensus weights are the low-median of the votes for each relay: they
can't change unless the votes change.
>
> What is changing in the votes to change the consensus weights?
The problem I see is that, in aggregate, the median votes values selected
by the consensus will, in a short span, shift around such that the _total_
consensus value moves significantly. This would not matter if individual
votes were updated as quickly as these shifts in totals, but in practice
individual relays are often not updated for sometimes two and even three
days. Individual relays see their consensus selection probability change
by 5% or even 10% (because the denominator changes) while the absolute
median for the relay (numerator) does not move at all.
In a word: anachronism
>
> Are some authorities not voting?
Voting continues, but not consistently across the entire set of relays.
SBWS likely does not suffer from this behavior.
> Are the Bandwidth= figures in the votes actually different?
Per the above, some change some do not.
An easy way to think about this is cases where one of the bwauths drops
out for a few hours or a day or two. The consensus total will experience
a huge jump in one hour but many relay median votes do not move at all.
This is the extreme case but it happens all the time without a bwauth
withdraw or join event.
> Or, are you talking about overall relay selection probability, which
depends on the total consensus weight?
This is all about the totals moving and shifting votes that are not
refreshing as quickly. Each relay class operates independently as a
practical matter, and exits have the worst time if it.
> Do other relays start Running or stop Running?
Relays are generally stable. It seems to me that occasionally a big
operator will take down or start up a block of a dozen or so high-
bandwidth nodes and this can trigger a shift, but it's not the principal
cause. The "rc" columns and percentages in the CSV can be used to look
for these.
> Do some relays start or stop being Guard or Exit?
Possibly, but again these events are not a big problem as AFAICT.
> > Twenty percent in 56 hours with votes shifting. The behavior results
in significant adjustment to the selection probability of relays with
unchanged consensus weights.
>
> The goal of the bandwidth weighting system is to provide a set of
weights that give clients equal performance, regardless of the particular
relays they choose.
>
> Maybe the load on the relay changes erratically, so its selection
probability should also change?
Again, in this situation I'm focused on consensus totals. Something about
the way Torflow votes from different authorities interact results in the
medians shifting wholesale while the individual votes sets appear mostly
stable. I did not try to analyze the exact nature of it, figuring it
would be worth the trouble only if the new system experiences this.
> Maybe other available relays change their performance, so this relay
should get used more (or less)?
>
> Do these erratic changes affect client performance?
Clients use selection probability, so yes for sure. If a node's
probability changes because the denominator moved, the number is still
different.
> Would clients perform better or worse without these erratic changes?
I believe this contributes to misrating, especially for faster relays
where the offset ratios are high, +1 and above (i.e 2x the average) and
could be a factor in relays overloading and seizing up as often happens.
I notice this when using SSH frequently--a good session will abruptly
become terrible or just freeze.
>
> > Please add to
> >
> > https://metrics.torproject.org/totalcw.html
>
> I think a separate graph would be better: having 6 authorities * 5
categories = 30 lines on a graph will be unmanageable.
sure, works for me
>
> Replying to [comment:5 starlight]:
> > I thought more about weighting the values (as in Relay Search), but it
makes no difference for the purpose which is to see if the totals of
medians continue jumping about with SBWS as presently happens with
Torflow. Simply graphing the total consensus for each selection class,
Exit, Guard, Middle is sufficient.
>
> I agree we should monitor the behavior of each class of relays.
>
> > (Exit,Exit+Guard) is the total of Exit-Only and Exit+Guard flagged
relays as this is the set used for choosing exits
>
> No, this is the set that is *currently* used for choosing exits. If tor
gets more exits in future, then Exit+Guard may be used as Guard.
yes, the weights. . .haven't fully wrapped my mind around how it all works
>
> So we shouldn't hard-code the assumption that Exit+Guard is only used as
an Exit.
>
> Instead, I suggest that we match the sets in
https://metrics.torproject.org/bwhist-flags.html
>
> Guard & Exit
> Guard only
> Exit only
> Middle only
>
> I noticed some other things while reviewing this ticket, I'll create
child tickets for them.
will watch with interest
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/28328#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list